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LGND gene coding for a calcium dependent protease 

The invention relates to the isolated gene coding for a calcium dependent 
protease belonging to the Calpain fannlly which, when it is mutated, is a cause of 
a disease called Limb-Girdle Muscular Dystrophy (LGMD). 

The term limb-girdle muscular dystrophy (LGMD) was first proposed by 
Walton and Nattrass (1954) as part of a classification of muscular dystrophies. 
LGMD is characterised by progressive symmetrical atrophy and weakness of the 
proximal limb muscles and by elevated serum creatine kinase. Muscle biopsies 
demonstrate dystrophic lesions and electromyograms show myopathic features. 
The symptoms usually begin during the first two decades of life and the disease 
gradually worsens, often resulting in loss of walking ability 10 or 20 years after 
onset (Bushby. 1994). Yet. the precise nosological definition of LGMD still 
remains unclear. Consequently, various neuromuscular diseases such as 
facioscapulohumeral. Becker muscular dystrophies and especially spinal 
muscular atrophies have been occasionally dassified under this diagnosis. For 
example, a recent study (Arikawa et al.. 1991) reported that 17% (out of 41) of 
LGMD patients showed a dystrophinopathy. These issues highlight the difficulty 
in undertaking an analysis of the molecular and genetic defect(s) involved in this 
20 pathology. 

Attempts to identify the genetic basis of this disease go back over 35 years. 
Morton and Chung (1959) estimated that "the frequency of heterozygous carrier 
... is 16 per thousand persons". The same authors also stated that 'the 
segregation analysis gives no evidence on whether these genes in different 
families are allelic or at different loci". Both autosomal dominant and recessive 
transmission have been reported, the latter being more common with an 
estimated prevalence of 10-S (Emery. 1991). The localisation of a gene for a 
recessive form on chromosome 15 (LGMD2A. MIM 253600; Beckmann et al.. 
1991) provided the definitive proof that LGMD is a specific genetic entity. 
Subsequent genetic analyses confirnied this chromosome 15 localisation (Young 
et al.. 1992: Passos-Bueno ct al.. 1993). the latter group demonstrating genetic 
heterogeneity of this disease. Although a recent study localised a second mutant 
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gene to chromosome 2 (LGMD2B, MIM 253601; Bashir et al.. 1994). there is 
evidence that at least one other locus can be involved. 

Genetic analyses of the LGMD2 kindreds revealed unexpected findings 
First genetic heterogeneity was demonstrated in the highly inbred Indiana Amish 
5 community. Second although the isle of la Reunion families were thought to 
represent a genetic isolate, at least 6 different disease haplotypes were 
observed . providing evidence against the hypothesis of a single founder effect 
(Beckmann et al., 1991) in this inbred population. 

The nonspecific nosological definition, the relatively low prevalence and 
10 genetic heterogeneity of this disorder limit the number of families which can be 
used to restrict the genetic boundaries of the LGMD2A interval. Cytogenetic 
abnormalities, which could have helped to focus on a particular region, have not 
been reported. Immunogenetic studies of dystroph in-associated proteins 
(Matsumura et a!.. 1993) and cytoskeletal or extracellular matrix proteins such as 
a merosin (Tome et al.. 1994) failed to demonstrate any deficiency. In addition, 
there is no known specific physiological feature or animal model that could help 
to identify a candidate gene. Thus, there is no alternative to a positional cloning 
strategy. 

It is establisheo that the LGMD2 chromosomal region is localized on 
20 chromosome 1 5 as 1 5q1 5. 1 - 1 5q21 . 1 region ( Fougerousse et al.. 1 994). 

Construction and analysis of a 10-12 Mb YAC contig (Fougerousse et al.. 
1994) permitted the mapping of 33 polymorphic markers within this interval and 
to further narrow the LGMD2A region to between D15S514 and D15S222. 
Furthermore, extensive analysis of linkage disequilibrium suggested a likely 
25 position for the gene in the proximal part of the contig. 

The invention results from the construction of a partial cosmid map and the 
screening by cDNA selection (Lovett et al.. 1991 ; Tagle et al.. 1993) for muscle- 
expressed sequences encoded by this interval led to the identification of a 
number of potential candidate genes. One of these, previously cloned by 
Sorimachi et al. (1989). encodes a muscle specific protein. nCLI (novel Calpain 
Large subunit 1). which belongs to the calpain family (CANP, calcium-activated 
neutral protease: EC 3.4.22.17). and appeared to be a functional candidate gene 
for this disease. 
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Calpains are non-lysosomal intracellular cysteine proteases which require 
calcium for their catalytic activities (for a review see Croall D.E. et al. 1991) The 
mammalian calpains include two ubiquitous proteins CANPi and CANP2 as well 
as tissue-specific proteins. In addition to the muscle specific nCLl stomach 
5 specific nCL2 and nCL2' proteins have also been described; these are derived 
from the same gene by alternative splicing. The ubiquitous enzymes consist of 
heterodimers with distinct large subunits associated with an common small 
subunit ; the association of tissue-specific large subunits with a small subunit 
has not yet been demonstrated. The large subunits of calpains can be 
I subdivided into A protein domains. Domains I and III. whose functions remain 
unknown, show no homology with known proteins. Domain I, however seems 
important for the regulation of the proteolytic activity. Domain II shows similarity 
with other cysteine proteases, sharing histidine. cysteine and asparagine 
residues at its active sites. Domain IV comprises four EF-hand structures which 
are potential calcium binding sites. In addition, three unique regions with no 
known homology are present in the muscle-specific nCL1 protein, namely NS 
IS1 and IS2. the latter containing a nuclear translocation signal. These regions 
may be important for the muscle specific function of nCL1. 

It IS usually accepted that muscular dystrophies are associated with excess 
or deregulated calpains. and all the known approaches for curing these diseases 
are the use of antagonists of these proteases ; examples are disclosed in EP 
359309 or EP 525420. 

The invention results from the finding that, on the opposite to all these 
hypothesis, the LGMD2 disease is strongly correlated to the defect of a calpain 
which IS expressed in healthy people 

The invention relates to the nucleic add sequence such as represented In 

LgZ h"*"" ' "^^^ " '-olved in 

LGMD2 disease, and more precisely LGMD2A. H also relates to a part of this 
sequence ptovided i, Is able to code for a p^tein having a calclum-dependen. 
protease activity involved in LGMD2, or a sequence derived from one of the 
above sequences by substitution, deletion or addition of one or more nucleotides 
provided ma. said sequence is still coding for said protein, all the nucleic adds 
yielding a sequence complementary to a sequence as defined above 
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The genomic organisation of the human nCL1 gene has been detemiined by 
the inventors, and consists of 24 exons and extends over 40 kb as represented 
in Figure 8, and is also a part of the invention. About 35 kb of this gene have 
been sequenced. A systematic screening of this gene in LGMD2A families led to 
5 the identification of 14 different mutations, establishing that a number of 
independent mutational events in nCLI are responsible for LGMD2A. 
Furthermore, this is the first demonstration of a muscular dystrophy resulting 
from an enzymatic rather than a structural defect. 

In the present specification, CANP3 means the protein which is a Ca** 
10 dependent protease, or calpain, and coded by the nCL1 gene on chromosome 
15. 

The invention relates also to a protein, called CANP3. consisting in the 
amino acid sequence such as represented in figure 2 and which is involved, 
when mutated, in the LGMD2 disease. 
15 The cDNA of the gene coding for CANP3, which is coding for the protein, is 

also represented in Figure 2, and is a part of the invention. 

The protein coded by this DNA is CANP3. a calcium-dependent protease 
belonging to the Calpain family. 

Are also included in the present invention the nucleic acid sequences 
20 dehved from the cDNA of Figure 2 by one or more substitutions, deletions, 
insertions, or by mutations in 5' or 3' non coding regions or in splice sites, 
provided that the translated protein has the protease, calcium-dependent 
activity, and when mutated, induce LGMD2 disease 

The nucleic acid sequence encoding the protein might be DNA or RNA and 
25 be complementary to the nucleic and sequence represented in Figure 2. 

The invention also relates to a recombinant vector including a DNA 
sequence of the invention, under the control of a promoter allowing the 
expression of the calpain in an appropriate host cell. 

A procaryotic or eucaryotic host cell transformed by or transfected with a 
30 DNA sequence comprising all or part of the sequence of Figure 2 is a part of the 
invention. 

Such a host cell might be either : 
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- a cell which is able to secrete the protein arid, this recombinant protein 
might be used as a drug to treat the LGM02, or 

- a packaging cell line transfected by a viral or retroviral vector ; the cell 
lines bearing recombinant vector might be used as a drug for gene therapy of 

5 LGMD2. 

All the systems used today for gene therapy including adenoviruses and 
retroviruses and others described for example In « TADN medicament ». (John 
Libbey. Eurotext. 1993). and bearing one of the DNA sequence of the invention 
are included herein by reference. 
' The examples hereunder and attached figures indicate how the structure of 

the gene was established, and how relationship between the gene and the 
LGMD was established. 



Legend of the figures . 
Figure 1 : 

A) Genomic organisation of the nCLI gene 

The gene covers a 40 kb region of which 35 were sequenced (Accession 
number pending), introns and exons are drawn to scale, the latter being 
indicated by numbered vertical bars. The first intron is the largest one and 
remains to be fully sequenced. Position of intragenic microsatellites are indicated 
by asterisks. Arrows indicate the orientation of Alu (closed) and of Mer2 (greyed) 
repeat sequences. 

B) EcoR\ restriction map 

An EcoRl (E) restriction map of this region was established with the help 
of cosmids from this region The location of nCLI gene is indicateo as a black 
bar. The size of the corresponding fragments are indicated and are underiined 
when determined by sequence analysis. 

C) Cosmid map of the nCLI gene region. 

Cosmids were from a cosmid library constructed by subcloning YAC 
774G4 (Richard in preparation) and are presented as lines. Dots on lines 
indicate positive STSs (indicated in boxed rectangles). A minimum of three 
cosmids cover the entire gene. T3.T7 
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Figure 2: Sequence of the human nCLI cDNA (B) . and the flanking 5 (A) and 3' 
(C) genomic regions. 

A) and C) The polyadenylation signal and putative CAAT. TATAA sites are 
boxed. Putative Spl (position -477 to -472), MEF2 binding sites (-364 to -343) 

5 and CArG box (-685 to -672) are in bold. The Alu sequence present in the 5* 
region is underlined. 

B) The corresponding amino acids are shown below the sequence. The coding 
sequence between the ATG initiation codon and the TGA stop codon is 2466 bp. 
encoding for a 821 amino acid protein. The adenine in the first methionine codon 

10 has been assigned position 1. Locations of introns within the nCLl gene are 
indicated by arrowheads. Nucleotides which differ from the previously published 
ones are indicated by asterisks. 

Fiqure.3: Alignments of amino acid sequences of the muscle-specific calpains. 

The human nCL1 protein is shown on the first line. The 3 muscle-specific 

15 sequences (NS, IS1 and IS2) are underlined. The second line corresponds to the 
rat sequence (Accession no P). The third and fourth lines show the deduced 
amino acid sequences encoded by pig and bovine Expressed Sequences 
Tagged (GenBank accession no U05678 and no U07858, respectively). The 
amino acids residues which are conserved among all known members of the 

20 calpains are in reverse letters. A period indicates that the same amino acid is 
present in the sequence. Letters refer to the variant amino acid found in the 
homologous sequence. Position of missense mutations are given as numbers 
above the mutated amino acid. 

^'q^re 4: Distribution of the mutations along nCLI protein structure. 
25 A) Positions of the 23 introns are indicated by vertical bars in relation to the 

corresponding amino acid coordinates. 

B) The nCLI protein is depicted showing the four domains (I. II. Ill, |V) and 

the muscle specific sequences (NS. IS1 and IS2). The position of missense 

mutations within nCL1 domain are indicated by black dots. The effect of 
30 nonsense and frameshift mutations are illustrated as tmncated lines. 

representing the extent of protein synthesised. Name of the corresponding 

families are indicated on the left of the line. The out of frame ORF is given by 

hatched lines. 
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figure 5: Northern blot hybridisation of a nCLI clone 

A mRNA blot (Clohtech) containing 2 pg of poly(A)+ RNA from each of 
eight human tissues was hybridised with a nCL1 genomic clone spanning exons 
20 and 21. The latter detects a 3.6 kb mRNA present only in a line 
5 corresponding to the skeletal muscle mRNA. 

^- Representative mutations identified by heteroduplex analysis. 
Examples of mutation screening by heteroduplex analysis. Pedigree B505 
shows the segregation of two different mutations in exon 22. 
7: Homozygous mutations In the nCL1 gene 
10 Detection by sequencing of mutations in exons 2 (a). 8 (b), 13 (c) and 22 

(d). Sequences from a healthy control are shown above each mutant sequence. 
Asterisks indicate the position of the mutated nucleotides. The consequences on 
codon and amino acid residues are indicated on the left of the figure together 
with the name of the family. 
15 Fioure 8 : Structure of nCLI gene 

Figure 8A represents the 5' part of the gene with exon 1 . 
Figure SB represents the part of the gene including exons 2 to 8. 
Figure 8C represents the part of the gene including exon 9. 
Figure 8D represents the part of the gene including exons 10 to 24 
20 including the 3' non transcribed region. 
EXAMPLgS 
EXAMPLF 1 

Localisation of the nCL1 within the LGMD2A interval 

Detailed genetic and physical maps of the LGMD2A region were 
15 constructed (Fougerousse et al.. 1994). following the pnmary linkage assignment 
to I5q (Beckmann et al.. 1991). The disease locus was bracketed between the 
D15S129 and D15S143 markers, defining the cytogenetic boundaries of the 
LGMD2A region as 15q15.1-15q21.1 (Fougerousse et al.. 1994) Construction 
and analysis of a 10-12 Mb YAC contig (Fougerousse et al.. 1994) permitted us 
0 to map 33 polymorphic markers within this interval and to further narrow the 
LGMD2A region to between D15S514 and D15S222. 
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The nCLI gene had been localised to chromosome 15 by hybridisation with 
sorted chromosomes and by Southern hybridisation to DNA from human-mouse 
cell hybrids (Ohno et al.. 1989).cDNA capture using YACs from the LGMD2A 
interval allowed the identification of thirteen positional candidate genes. nCL1 

5 was one of the two transcripts identified that showed muscle-specific expression 
as evidenced by northen blot analysis.The localisation was further confirmed by 
STS (for Sequence Tagged Site) assays. Primers used for the localisation of the 
nCLI gene are P94in2, P94inl3 and pcr6a3, as shown in Figure 1 and their 
characteristics being defined In Table 1 . 

10 Table 1 : PCR primers used for localisation of the nCL1 gene. 



Pnmer name 


Primer sequence (5 '-3') 


Position 
nithin the 
cDNA 


Annealing 
tempC'C) 


PCR produa size on 
cDNA genomic DNA 


P94in2 


ATGGAGCCAACAGAACTGA 


341-360 


58 


108 


1758 




C 


428-448 










GTATGACTCGGAAAAGAAG 












GT 










P94inl3 


TAAGCAAAAGCAGTCCCCA 


I893-19I2 


58 


64 


1043 




C 


1936-1956 










TTGCTGTTCCTCACTTTCCT 
C 










P<»4.6a3 


GTTTCATCTGCTGCTTCGTT 


2342-2361 


56 


130 


818 




CTGGTTCAGGCATACATGG 
T 


2452-2471 








P94exlier 


TTCTTTATGTGGACCCTGAG 


218-239 


55 


76 


76 




TT 


275-293 










ACGAACTGGATGGGGAACT 











These primers are designed from different parts of the published human 
cDNA sequence (Sorimachi et al.. 1989). and were used for an STS content 

15 screening on DNA from three chromosome 15 somatic cell hybrids and YACs 
from the LGMD2A contig. The results positioned the gene in a region previously 
defined as 15q15.1-q21.1 and on 3 YACs (774G4. 926G10. 923G7) localised in 
this region. The relative positions of STSs along the LGM02A contig allowed to 
localise the gene between D15S512 and D1 58488. in a candidate region 

20 suggested by linkage disequilibrium studies. 

The same primers as above were used to screen a cosmid library from YAC 
774G4 A group of 5 cosmids was identified (Fig. 1). Experiments with another 
nCLl primer pair (P94exlter. Table 1) established that these cosmids cover all 
nCLI exons except number 1 , and that a second group of 4 cosmids contain this 
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exon (Fig. 1). A minimal set of three overlapping cosmids (2G8-2B11-1F1 1 ) 
covers the entire gene (Figure 1). DNA from these cosmids was used to 
construct an EcoRI restriction map of this region (Figure 1 B) 
EXAMPLE 2 

5 Determination of the nCL1 gene sequence 

Most of the sequences were obtained through shotgun sequencing of partial 
digests of cosmid 1F1 1 subcloned in M13 and blueschpt vectors, and by walking 
with internal primers. The sequence assembly was made using the XBAP 
software of the Staden package (Staden) and was in agreement with the 
10 restriction map of the cosmids. Sequences of exon 1 and adjacent regions were 
Obtained by sequencing cosmid DNA or PCR products from human genomic 
DNA. The first intron is still not fully sequenced, but there is evidence that it may 
be between 10 to 16 kb in length (based on hybridisation of restriction fragments; 
data not shown). The entire gene, including its 5' and 3' regions, is more than 40 
15 kb long, and shown in Figure 8. 
a) thecDNA sequence 

The used technology allows the implementation of the published human 
CDNA sequence of nCLI (Sorimachi 1989). It contains the missing 129 bases 
corresponding to the N-terminal 43 amino acids (Figure 2). It also differs from it 

^0 at 12 positions. Three of which occur at third base positions of codons and 
preserve the encoded amino acid sequence. The other 9 differences lead to 
Changes in amino-acid composition (Figure 2). As these different exons were 
sequenced repeatedly on at least 10 distinct genomes, we are confident that the 
sequence of Fig. 2 represents an authentic sequence and does not contain 

5 minor polymorphic variants. Furthennore. these modifications increase the local 
Similarity with the rat nCLI amino acid sequence (Sorimachi). although the 
overall similarity is still 94 %. 

The ATG numbered 1 in Figure 2 is the translation initiation site based on 
homology with the rat nCLI . and is within a sequence with only 5 nucleotides out 
> of 8 ,n common with the Kosak consensus sequence (Kosak M. 1984). Putative 
CCAAT and TATA boxes were observed 590. 324. (CCAAT) and 544 or 33 bp 
(TATA) upstream of the initiating ATG codon. respectively (Bucher 1990) A GC 
box binding the Spl protein (Dynan et al.. 1983) was identified at position ^77 
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Consensus sequences con-esponding to potential muscle-specific regulatory 
elements were identified (Fig. 2). These include a myocyte-specific enhancer- 
binding factor 2 (MEF2) binding site (Cserjesi P. 1991), a CArG box (Minty A. 
1986) and 6 E-boxes (binding sites for basic Helix-Loop-Helix proteins frequently 
5 found in members of MyoD family; Blackwell et Weintraub, 1990). The functional 
significance of these putative transcription factor binding sites in the regulation 
of nCL1 gene expression remains to be established. 

Two potential AAUAAA polyadenylation signals, were identified 520 and 777 
bp downstream of the TGA stop codon. The sequencing of a partial nCLI cDNA 

10 containing a polyA tail, demonstrated that the first AAUAAA is the 
polyadenylation signal. The latter is embedded in a region well conserved with 
the rat nCL1 sequence and is follov^red after 4 bp by a G/T cluster, present in 
most genes 3' of the polyadenylation site (Bimstiel et al.. 1985) The 3- 
untransiated region of the nCLI mRNA is 565 bp long The predicted length of 

15 the cDNA should therefore be approximately 3550 or 3000 bp. 

b) Comparison with caloain 

The sequence of the human nCLI gene was compared to those of other 
calpains thereof (Figure 3). The most telling comparisons are with the 
homologous rat (Accession no J05121). bovine (Accession no U07858) and 

20 porcine (Accession no U05678) sequences. The accession numbers refers to 
those or international genebanks, such as GeneBank (N.I.H ) or EMBL Database 
(EMBL. Heidelberg). High local similarities between the human and rat DNA 
sequences are even observed in the 5" (75%) or in different parts of the 3' 
untranslated regions (over 60%) (data not shown). The high extent of sequence 

25 homology manifested by the human and rat nCLI gene in their untranslated 
regions is suggestive of evolutionary pressures on common putative regulatory 
sequences. 

c) Genomic organisation of the nCLI oene 

A comparison of the published nCLI human cDNA (Sorimachi et al.. 1989) 
30 with the corresponding genomic sequence led to the identification of 24 exons 
ranging in length from 12 bp (exon 13) to 309 bp (exon 1). with a mean size of 
100 bp (Figure 1). The size of introns ranges from 86 bp to about 10-16 Kb for 
intron 1 . 



<WO 8616175A2_I_> 



wo 96/16175 



PCT/EP9SMM575 



II 



The intron^xon boundaries as shown-in Table 2 exhibit close adherence to 
5- and 3* splice site consensus sequences (Shapiro and Senapathy. 1987). 
labtea. Sequences at the intron-exon junctions. A score expressing adherence 
to the consensus was calculated for each site according to Shapiro and 
Senapathy (1987). Sequences of axons and introns are in upper and lower 
cases, respectively. Size of exons are given in parenthesis. 



splice donor site 


score 
f%) 


Intron 


score 

(%) 


splice aocqnor site 


Exon 












Exon 1 (309 bp) •> 


. CTCCGgigagi ... 


88.5 


<-lniron !-> 


99.0 


...tmtgtncacagGAAAT.. 


Exon 2 (70 bp) -> 


GCTAGgiagga.. 


83.5 


<-lniron 2-> 


90.0 


...gtgtagcagcagGGGAC... Exon3 (ll9bp)-> 


...TCCAGgigagg.. 


92 


<-lmron 3-> 


81.5 


...acgciicigigcagTTCTG. 


Exon 4 (134 bp) ^ 


.GCTAAgiaagc... 


82 


<-lmron 4-> 


81 5 


-..atcactctctaagGCTCC. 


EvonS (169bp)-> 


. TTGATgiaagi,.. 


87 


^-Intron 

www 1 




..ccatcgggcacagGATGG.. Exon6 (J44bp).> 


CCCGGgigigi 


77.5 


<*Intron 6->> 




.-.ttacigactacagACAAT. 


E.\on 7 (g4 bp) •> 


ATGAGgtnncc 


94 


<-lniron 7-> 


78 ^ 


• ictgigigacaagGTCCC . 


Exon 8 (86 bp) -> 


CATAGgiagn ... 


89 


<*intmn K-> 


0 1 < 


...cattncccaccagATGGA. . 


Exon 9 (78 bp) -> 


TTCTGgigaci. 


88 


<-Ifitron 


y2 


. ..uccaacaacagG ATGT. . 


Exon 10(161 bp)-> 


■CCCAGgiccca.. 


80 


^ iiiliUli Jli*-^ 


68.5 


• - ttctgggggigcag AT ACT. 


Exon 11 (170 bp) -> 


ACGAGgigigi. 


85.5 


^-iniron i 


86 


.igittciicicaagGTTCC. . 


Exon 12 (12 bp).> 


AAGAGgiaiag... 


70 




87 


. . . tccccataacag ATCC A . . . 


Exon 13 (2091^) -> 


TCTGAgtgagi... 


76.5 


^-Intron ]'^*> 




. . .tgiaucacacagGGAAG. . . 


E.von 14 (37bp)-> 


CAGTGgigagi.. 


89 


<-Imron )4.> 


93.5 


. . .atncttaigcag AA AAA. . 


Exon 15 (18bp)-> 


CCAAGgiaggi... 


89 


<-Inrron 15-> 


87 


. . cacctctaccagCCCAT. . . 


Exonl6 (114 bp)-> 


CACAGgigici.. 


80 


<-Iniron l6-> 


88 


. .ttgigcaccacagCC AC A. . . 


Exon 17 (78 bp) 


GAGATgigagi.. 


84 


<-lntron I7-> 


92.5 


. .cccncacacagG ACAT. . . 


Exon 18 (58 bp) -> 


CAAACgigagi 


83 


<-Iniron 18-> 


90 


. xiccaiccccccagACAAG. . . 


E.von 19 (65 bp) -> 


TGGATgtaicc. 


56 


<-iniron l9-> 


88 


. . .caccacaccag AC AG A. . . 


Exon 20 (69bp)*> 


GGCAGgiggga.. 


80 


<-Imron 20-> 


94 


. .mtctattgccagAAATA. . . 


Exon 2 1 (79bp)-> 


CGCAGgigcig 


66 


<-lniron 2l-> 


91 


..ggtcccaccacagGATTC. . . 


Exon 22 (lI7bp)-> 
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...GTTCAgiaagl.. 79 <-lniron22-> 93.5 ...gcattciucacaeGAGCT.. Exon2? (59bp>-> 

TGGAGgtaaag... 81 <-lmron23'> 79 ...ggcacttctttcacTGGCT.. Exon24 (27bp>'> 

When the genomic sequence was submitted to GRAIL analysis (Uberbacher 
et al., 1991), 11 exons were correctly recognised, 4 were not identified. 6 were 
inadequately defined and 2 were too small to be recognised (data not shown). 

5 As already noted, the nCL1 gene has three unique sequence blocks. NS 

(amino acid residues 1 to 61 ), IS1 (residues 267 to 329) and IS2 (residues 578 
to 653). It is interesting to note that each of these sequences, as well as the 
nuclear translocation signal inside IS2. are essentially flanked by introns (Fig. 4). 
The exon-intron organisation of the human nCLI is similar to that reported for 

1(1 the chicken CANP (the only other large subunit calpain gene whose genomic 
structure is known; (Emori et al., 1986). 

Four microsateliite sequences were identified. Two of them are in the distal 
part of the first intron: an (AT)14 and an previously identified mixed-pattem 
microsateliite, S774G4B8. which was demonstrated to be non polymorphic 

15 (Fougerousse et al.. 1994). A {TA)7(CA)4(GA)13 was identified in the second 
intron and genotyping of 64 CEPH unrelated individuals revealed two alleles 
(with frequencies of 0.10 and 0.90). The fourth microsateliite is a mixed 
(CA)n(TA)m repeat present in the 9th intron. The latter and the (AT)14 repeat 
have not been investigated for polymorphism. Fourteen repetitive sequences of 

20 the Alu family and one Mer2 repeat were identified in the nCLI gene (Fig. 1C), 
which has, thus, on the average one Alu element per 2,5 kb. 

Southern blot experiments (Ohno et al., 1989) and STS screening (data not 
shown) suggest that there is but one copy per genome of this member of the 
calpain family. 

25 EXAMPLE 3 

Expression of the nCLI gene 

The pattern of tissue-specificity was investigated by northern blot 
hybridisation with a genomic subclone probe from cosmid 1F11 spanning exons 
20 and 21 There is no evidence for the existence of an alternatively spliced form 
30 of nCLI, although this cannot be excluded. A transcript of about 3.4-3.6 kb was 
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detected in skeletal muscle mRNA (Figure 5). This size therefore favours that the 
position -544 is the functional TATA box. 

Transcription studies suggested that it is an active gene rather than a 
pseudogene and its muscle-specific pattern of expression is consistent with the 
phenotype of this disorder (Sohmachi et al.. 1989 and Figure 5) 

EXAMPLE 4 

Mutation screening 

nCLI fulfils both positional and functional criteria to be a candidate gene for 
LGMD2A. To evaluate Its role in the etiology of this disorder. nCL1 was 
systematically screened in 38 LGMD2 families for the presence of nucleotide 
changes using a combination of heteroduplex (Keen et al.. 1991) and direct 
sequence analyses. 

PGR primers were designed to specifically amplify the exons and splice 
junctions and also the regions containing the putative CAT. TATA boxes and the 
polyadenylation signal of the gene as shown in Table 3. 

lableS: PGR primers used for the analysis of the nCL1 gene in LGMD patients. 



amplified region 



Primer sequences (5-3*) 



Size (bp) 


Annealing 
temp. CO 


296 


59 


438 


60 


239 


57 


354 


58 


292 


59 


325 


56 


315 


57 


333 


56 


321 


^8 


173 


56 


251 


56 


355 


57 


312 


61 


337 


60 



proniotor 
exon I 
exon 2 
exon 3 
exon 4 
exon 5 
exon 6 
exon 7 
exon 8 
exon 9 

exon 10 

exon 1 1 

exon 12 

exon 13 



TTCAGTACCTCCCGTTCACC 

GATGCTTGAGCCAGGAAAAC 

CTTTCCTTGAAGGTAGCTGTAT 

GAGGTGCTGAGTGAGAGGAC 

ACTCCGTCTCAAAAAAATACCT 

ATTGTCCCTTTACCTCCTGG 

TGGAAGTAGGAGAGTGGGCA 

GGGTAGATGGGTGGGAAGTT 

GAGGAATGTGGAGGAAGGAC 

TTCCTGTGAGTGAGGTCTCG 

GGAACTCTGTGACCCCAAAT 

TCCTCAAACAAAACATTCGC 

GTTCCCTACATTCTCCATCG 

GTTATTTCAACCCAGACCCTT 

AATGGGTTCTCTGGTTACTGC 

AGCACGAAAAGCAAAGATAAA 

GTAAGAGATTTGCCCCCCAG 

TCTGCGGATCATTGGTTTTG 

CCTTCCCTTCTTCCTGCTTC 

CTCTCTTCCCCACCCTTACC 

CCTCCTCACCTGCTCCCATA 

TTTTTCGGCTTAGACCCTCC 

TGTGGGGAATAGAAATAAATGG 

CCAGGAGCTCTGTGGGTCA 

GG CTCCTCATCCTCATTCACA 

GTGGAGGAGGGTGAGTGTGC 

TGTGGCAGGACAGGACGTTC 
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TTCAACCTCTGGAGTGGGCC 



exon 14 


CACCAGAGCAAACCGTCCAC 
ACAGCCCAGACTCCCATTCC 


230 


61 


exon 15 


TTCTCTTCTCCCTTCACCCT 
ACACACTTCATGCTCTCTACCC 


225 


57 


exon 16 


CCGCCTATTCCTTTCCTCTT 
GACAAACTCCTGGGAAGCCT 


331 


56 


exon 17 


ACCTCTGACCCCTGTGAACC 


270 


61 




TGTGGATTTGTGTGCTACGC 




exon 18 


CATAAATAGCACCGACAGGGA 
GGGATGGAGAAGAGTGAGGA 


258 


59 


exon 19 


TCCTCACTCTTCTCCATCCC 
ACCCTGTATGTTGCCTTGG 


159 


57 


exons 20-21 


GGGGATTTTGCTGTGTGCTG 


333 


61 




ATTCCTGCTCCCACCGTCTC 




exon 22 


CACAGAGTGTCCGAGAGGCA 


282 


57 


exons 22-23 


GGAGATTATCAGGTGAGATGCC 




CAGAGTGTCCGAGAGGCAGGG 


608 


61 




CGTTGACCCCTCCACCTTGA 




exon 24 


GGGAAAACATGCACCTTCTT 


375 


58 


polyadenylaiion signal 


TAGGGGGTAAAATGGAGGAG 




ACTAACTCAGTGGAATAGGG 


413 


56 



GGAGCTAGGATAGCTCAAT 



10 



PGR products made on DNA from blood of specific LGMD2A patients were 
then subjected either to heteroduplex analysis or to direct sequencing, 
depending on whether the mutation, based on hapiotype analysis, was expected 
to be homozygous or heterozygous, respectively. It was occasionally necessary 
to clone the PGR products to precisely identify the mutations (i.e.. for 
microdeletions or insertions and for some heterozygotes). Disease-associated 
mutations are summarised in Table 4 hereunder and their position along the 
protein is shown in Fig. 4. 
"^3^)16 4: nGL1 mutations in LGMD2A families. 

Codons and amino acid positions are numbered on the basis of the cDNA 
sequence starting from ATG. 



E.von Families 



Nucleotide Nucleotide change Amino acid Amino acid Restriction si 
P05"'0" position change 



2 B519* 328 

"« M42 545 

4 MI 394: M28S8 550 

M35. M37 701 



CGA.>TGA 
CIO -> CAG 
CAA -> CA 
GGG -> GAG 



110 
182 
184 

234 



Arg->siop 
Leu->Gln 
frameshift 
G)v.>Glu 
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iO 



15 



20 



6 


M32 


945 


CGG -> CG 


315 


frameshifl 


8 


M2407" 


1061 


GTG->OCG 


354 


Val-> Gly 


8 


Ml 394 


1079 


TfiG -> T^G 


360 


Trp->siop 


11 


M2888 


1468 


CGG -> IGG 


490 


Arg->Tip 


I? 


RI2" 


1715 


CfiG -> CAG 


572 


ATg->Gln 


19 


R27 


2069-2070 


deletion AC 


690 


fiameshifi 


21 


R14; RI7 


2230 


AGC->QGC 


744 


Ser->Gly 


22 


A»:B501»: 


M32 2306 


CfiG -> CAG 


769 


Arg->Gln 


22 


B505 


2313-2316 


deletion AGAC 


771-772 


fiameshift 


22 


R14:B505 


2362-2363 


AG -> TCATCT 


788 


frameshtfi 



•Mspl 



-Alul 



The first letter of the family code refers to the origin of the population B= Brazil. 
M= metropolitan France. R = Isle of La Reunion. A= Amish. 

Each mutation was confirmed by heteroduplex analysis, by sequencing of 
both strands in several members of the family or by enzymatic digestion when 
the mutation resulted in the modification of a restriction site. Segregation 
analyses of the mutations, perfonned on DNAs from all available members of the 
families, confirmed that these sequence variations are on the parental 
chromosome canying the LGMD2A mutation. To exclude the possibility that the 
missense substrtutions might be polymorphisms, their presence was 
systematically tested in a control population: none of these mutations was seen 
among 120 control chromosomes from the CEPH reference families. 

EXAMPLE 5 

Analysis of families genes, chromosome-15 ascertained families 
The initial screening for causative mutations was performed on families, 
each containing a LGMD gene located on chromosome 15. These included 
families from the Island of La Reunion (Beckmann et al.. 1991). from the Old 
Order Amish from northern Indiana (Young et al.. 1992.) and 2 Brazilian families 
(Passes Bueno et al.. 1993). 
a) Reunion Island famiiips 

Genealogical studies and geographic isolation of the families from the Isle 
of La Reunion were suggestive of a single founder effect. Genetic analyses are. 
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however, inconsistent with this hypothesis as the families present haplotype 
heterogeneity. At least, six different carrier chromosomes are encountered, (with 
affected individuals in several families being compound heterozygotes) Distinct 
mutations corresponding to four of these six haplotypes have been identified 
5 thus far. 

In family R14, exons 13. 21 and 22 showed evidence for sequence vahation 
upon heteroduplex analysis (Fig. 6). Sequencing of the associated PGR products 
revealed (i) a polymorphism in axon 13, (ii) a missense mutation (A->G) in exon 
21 transforming the Ser744 residue to a glycine in the loop of the second EF- 

10 hand in domain IV of the protein (Figure 4), and (iii) a frameshift mutation in exon 
22. The exon 21 mutation and the polymorphism in exon 13 form an haplotype 
which is also encountered in family R17. Subcloning of the PGR products was 
necessary to identify the exon 22 mutation. Sequencing of several clones 
revealed a replacement of AG by TGATGT (data not shown). This frameshift 

15 mutation causes premature termination at nucleotide 2400 where an in frame 
stop codon occurs (Figure. 4). 

The affected individuals in family R12 are homozygous for all markers of the 
LGMD2A interval (Allamand, submitted). Sequencing of the PGR products of 
exon 13 revealed a G to A transition at base 1715 of the cDNA resulting in a 

20 substitution of glutamine for Arg572 (Figure. 7) within domain III, a residue which 
is highly conserved throughout all known calpains This mutation, detectable by 
loss of Msp\ restriction site, is present only in this family and in no other 
examined LGMD2A families or unrelated controls. 

In family R27. heteroduplex analysis followed by sequencing of the PGR 

25 products of an affected child revealed a two base pair deletion in exon 19 
(Figure. 6 and table 4). One AC out of three is missing at this position of the 
sequence, producing a stop codon at position 2069 of the cDNA sequence 
(Figure 4). 

b) Amish families 

As expected, due to multiple consanguineous links, the examined LGMD2A 
Northern Indiana Amish patients were homozygous for the haplotype on the 
chromosome bearing the mutant allele (Allamand, submitted). A (G->A) 
missense mutation was identified at nucleotide 2306 within exon 22 (Fig. 7). The 
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resulting codon change is CGG to CAG, transforming Arg769 ^ glutam.ne This 
residue, which is conserved throughout ail members of the calpain family in all 
species, is located in domain IV of the protein within the 3rd EF-hand at the 
helix-loop junction (ref). This mutation was encountered in a homozygous state 
! in all patients from 12 chromosome 15.|inked Amish families, in agreement with 
the haplotype analysis. We also screened six Southern Indiana Amish LGMD 
families, for which the chromosome 15 locus was excluded by linkage analyses 
(Allamand ESHG. submitted. ASHG 94). As expected, this nucleotide change 
was not present in any of the patients from these families, thus confirming the 
genetic heterogeneity of this disease in this genetically related isolate 
c) Brazilian famiii^c 

As a result of consanguineous marriages, two Brazilian families (B501. 
B519) are homozygous for extended LGMD2A carrier haplotypes (data not 
Shown). Sequencing PGR products from affected individuals of these families 
15 demonstrated that family B501 has the same exon 22 mutation found in northern 
Indiana Amish patients (Figure 7). but embedded in a completely different 
haplotype. In family B519. the patients carry a C to T transition in exon 2 
replacing Arg328 v.ith a TGA stop codon (Figure 7). thus leading, presumably, to 
a very truncated protein (Figure 4) 
^" Analysis nf other I fSMp famiii.^ 

Hav,ng validated the role of the candidate gene in the chromosome 15 
ascertained families, we next examined by heteroduplex analysis LGMD families 
for which linkage data were not infomiative. These included one Brazilian (BSOS) 
and 13 metropolitan French pedigrees. 
!5 Heteroduplex bands were revealed for exons 1 . 3. 4. 5. 6. 8 11 22 of one 

or more patients (Figure 6) Of all sequence variants. 10 were identified as 
possible pathogenic mutations (5 missense. 1 nonsense and A frameshift 
mutations) and 3 as polymorphisms with no change of amino acid of the protein 
All causative mutations identified are l.sted in Table 4 here-above Identical 
« -nutafions were uncovered in apparently unrelated families. The mutaHons 
Shared by families M3S and M37. and M288e and Ml 394, respectively are likely 
to be the consequence of independent events since they are embedded in 
different marker haplotypes. In contrast, it is likely that the point mutation in exon 
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22 of the Amish and in the M32 kindreds con-esponds to the same mutational 
event as both chromosomes share a common four marker haplotype (774G4A1- 
774G4A10-774G454D-774G4A2) around nCL1 (data not shown), possibly 
reflecting a common ancestor. The same holds true for the AG to TCATCT 
5 substitution mutation encountered in exon 22 in families B505 and R14. The 
exon 8 (T->G) transversion is present in the two carrier chromosomes of M2407. 
the only metropolitan family homozygous by haplotype, possibly reflecting an 
undocumented consanguinity. For some families, no disease-causing mutation 
has been detected thus far {M40 for example). 

10 In addition to the polymorphism present in exon 13 in families R14 and R17 

(position 668) and in the intragenic microsatellites. four additional neutral 
variations were detected: a (T->C) transition at position 96. abolishing a Ddel 
restriction site in exon 1 in M31; a (C->T) transition in exon 3 (position 495) in 
M40 and in M37 forming a haplotype with the exon 5 mutation (in the former 

15 family, this polymorphism does not cosegregate with the disease); a (T->C) 
transition in the paternally derived promoter in M42 at position -428. which was 
also evidenced in healthy controls; and a variable poly(G) in intron 22 close to 
the splice site in families R20. R11. Ri9. M35 and M37. The latter is also 
present in the members of the CEPH families, but is not useful as a genetic 

20 marker as the visualisation and interpretation of mononucleotide repeat alleles is 
difficult. 

In total, sixteen independent mutational events representing fourteen 
different mutations were identified. All mutations cosegregate with the disease in 
LGMD2A families. The characterised morbid calpain alleles contain nucleotide 

25 changes which were not found in alleles from normal individual. The discovery of 
two nonsense and five frameshift mutations in nCLI supports the hypothesis that 
a deficiency of this product causes LGMD2A. All seven mutations result in a 
premature in-frame stop codon. leading to the production of truncated and 
presumably inactive proteins (Figure 4). Evidences for the morbidity of the 

30 missense mutations come from (1 ) the relative high incidence of such mutations 
among LGMD2A patients ; although it is difficult in the absence of functional 
assays to differentiate between a polymorphism and a morbid mutation, the 
occun-ence of different "missense" mutations in this gene cannot all be 
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accounted for as rare pfivate polymorphisms: (2) the failure to observe these 
mutations in control chromosomes: and (3) the occurrence of mutations .n 
evolutionary conserved residues and/or in regions of documented functional 
importance. Four of seven missense mutations change an amino acid which is 
5 conserved in all known members of the calpain family in all species (Figure 3) 
Two of the remaining mutations affect less conserved amino acid residues but 
are located in important functional domains. The substitution V354G in exon 8 is 
4 residues before the asparagine at the active srte and S744G in exon 21 is 
within the loop of the second EF-hand and may impair the calcium-dependent 
regulation of calpain activity or the interaction with a small subunit (Figure 4) 
Several missense mutations change a hydrophobic residue to a polar one or 
vice versa (Table 4) possibly disrupting higher order structures 
METHODS 

Description r» f the oatientg 

The LGMD2A families analysed were from 4 different geographic origins 
They .ncluded 3 Brazilian families. 13 interrelated nuclear families from the Isle 
of la Reunion. 10 French metropolitan families and 12 US Amish families The 
majority of these families were previously ascertained to l»long to the 
Chromosome 15 group by linkage analysis (Beckmann, 1991: Young Passos- 
Bueno et a!.. 1993). However, some families from metropolitan France as well as 
one Brazilian family. BS05, had non significant lodscores for chromosome 15 
Genomic DNA was obtained from peripheral blood lymphocytes 

Sequ^np.mnfrn.mldr77,ri iri1 ^^^„ ^„ ^ 

Cosmid 1 F1 1 (Figure 1 C) was subcloned following DNA preparation through 
Qiagen procedure (Oiagen Inc.. USA) and partial digestk,n with either Saa3A 
Rsa, or A/ul. Size-selected restriction fragments were recovered fom low-melting 
agarose and eventually ligated with M13 or Bluescript (Stratagene USA) 
«'««~P=ra.ion in £co«. r«x^inant colonies were picked ,n 100 

10 mM Tns-HCl. pH 9.0. 50 mM KCI. 1.5 mM MgCI2. 0 1% Triton X-100 0 01 
9..at™. 200MM of each dNTP. i u o, Ta, Polymerase (Amersham) with 100 ng 

^S-C tn"°'H'! "y 5 min denaturation at 

95 C. followed by 30 cycles of 40 sec denaturation at 92-C and 30 sec annealing 
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at 50'C. PGR products were purified through MIcrocon devices (Amicon. USA) 
and sequenced using the dideoxy chain termination method on an ABI 
sequencer (Applied Biosystems, Foster City, USA). The sequences were 
analysed and alignments performed using the XBAP software of the Staden 
3 package, version 93.9 (Staden. 1982). Gaps between sequence contigs were 
filled by walking with internal primers. EcoRl restriction map of cosmids was 
performed essentially as described in Sambrook et al. (1989). 
Northern Blot analysis 

The probes were labelled by random priming with dCTP-(a32p) 
10 Hybridisation was performed to human multiple tissue northern blots as 
recommended by the manufacturer (Clontech, USA). 
Analysis of PGR products from LGMD2A families 

One hundred ng of human DNA were used per PGR under the buffer and 
cycle conditions described in Fougerousse (1994) (annealing temperature shown 

15 in Table 3). Heteroduplex analysis (Keene et al.. 1991) was performed by 
electrophoresis of ten pi of PGR products on a 1.5 mm-thick Hydrolink MDE gels 
(Bioprobe) at 500-600 volt for 12-15 h depending of the fragment length. 
Migration profile was visualised under UV after ethidium bromide staining. 

For sequence analysis, the PGR products were subjected to dye-dideoxy 

20 sequencing, after purification through microcon devices (Amicon. USA). When 
necessary, depending on the nature of the mutations (e.g.. frameshift mutation or 
for some heterozygotes), the PGR products were cloned using the TA cloning kit 
from Invitrogen (UK) One pi of product was ligated to 25 ng of vector at 12X 
overnight. After electroporation into XLI-blue bacteria, several independent 

25 clones were analysed by PGR and sequenced as described above. 

The invention results from the finding that the nGLI gene when it is mutated 
is involved in the etiology of LGMD2A. It is exactly the contrary to what is stated 
in the litterature. e.g. that the disease is accompanied by the presence of a 
deregulated calpain. Identification of nCLI as the defective gene in LGMD2A 

30 represents the first example of muscular dystrophy caused by mutation affecting 
a gene which is not a structural component of muscle tissue, in contrast with 
previously identified muscular dystrophies such as Duchenne and Becker 
(Bonilla et al.. 1988). severe childhood autosomal recessive (Matsumara et al.. 
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1992). Fukuyama (Matsumara et al.. 1993) and merosin^eficient" congenital 
muscular dystrophies (Tome et al., 1994). 

The understanding of the LGMD2A phenotype needs to take into account 
the fact that there is no active nCL1 protein in several patients, a loss compatible 
5 with the recessive manifestation of this disease. Simple models in v^rhich this 
protease would be involved in the degradation or destabilisation of structural 
components of the cytoskeleton. extracellular matrix or dystrophin complex can 
therefore be ruled out. Furthemiore. there are no signs of such alterations by 
•mmunocytogenetic studies on LGMD2 muscle biopsies (Matsumara et al.. 1993; 
) Tome et al.. 1994). Likewise, since LGMD2A myofibers are apparently not 
different from other dystrophic ones, it seems unlikely that this calpain plays a 
role in myoblast fusion, as proposed for ubiquitous calpains (Wang et al., 1989). 

All the data disclosed in these examples confinn that the nCLI gene is a 
major gene involved in the disease when mutated. 

The fact that morbidity results from the loss of an enzymatic activity raises 
hopes for novel pharmaco-therapeutic prospects. The availability of transgenic 
models will be an invaluable tool for these investigations. 

The invention is also relative to the use of a nucleic acid or a sequence of 
nucleic acid of the indention, or to the use of a protein coded by the nucleic acd 
for the manufacturing of a drug in the prevention or treatment of LGMD2. 

The finding that a defective calpain underlies the pathogenesis of LGMD2A 
may prove useful for the identification of the other loci involved in the LGMDs 
Other fomis of LGMD may indeed be caused by mutations in genes whose 
products are the CANP substrates or in genes involved in the regulation of nCLI 
expression. Techniques such as the two-hybrid selection system (Fields et al 
1989) could lend themselves to the isolation of the natural protein substrate(s) of 
this calpain, and thus potentially help to identify other LGMD loci. 

The invention also relates to the use of all or a part of the peptidic sequence 
Of the enzyme, or of the enzyme, product of nCLI gene, for the screening of the 
•.gands Of this enzyme, which might be also involved in the etiology and the 
morbidity of LGMD2 

The ligands which might be involved are for example substrate(s). activators 
or inhibitors of the enzyme. 
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The nucleic acids of the invention might also be used in a screening method 
for the determination of the components which may act on the regulation of the 
gene expression. 

A process of screening using either the enzyme or a host recombinant cell. 
5 containing the nCL1 gene and expressing the enzyme, is also a part of the 
invention. 

The pharmacological methods, and the use of nucleic acid and peptidic 
sequences of the invention are very potent applications. 

The methods used for such screenings of iigands or regulatory elements are 
10 those described for example for the screening of Iigands using cloned receptors. 

The identification of mutations in the nCL1 gene provides the means for 
direct prenatal or presymptomatic diagnosis and carrier detection in families in 
which both mutations have been identified. Gene-based accurate classification 
of LGMD2A families should prove useful for the differential diagnosis of this 
J 5 disorder. 

The invention relates to a method of detection of a predisposition to LGMD2 
in a family or a human being, such method comprising the steps of : 

- selecting one or more exons or flanking sequences which are sensitive in 
said family: 

20 - selecting the primers specific for the or these exons or their flanking 

sequences, a specific example being the PGR primers of Table 3, or an hybrid 
thereof. 

- amplifying the nucleic acid sequence, the substrate for this amplification 
being the DNA of the human being to be checked for the predisposition, and 

25 - comparing the amplified sequence to the corresponding sequence derived 

from Figure 2 or Figure 8. 

Table 2 indicates the sequences of the introns-exons junctions, and primers 
comprising in their structure these junctions are also included in the invention. 
All other primers suitable for such RNA or DNA amplification may be used in 
30 the method of the invention. 

In the same way. any suitable amplification method : PGR (for Polymerase 
Chain Reaction ®) NASBA ® (for Nucleic acid Sequence Based Amplification), 
or others might be used 
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The methods usually used in the detection of one site mutations, like ASO 
(Allele specific PGR). LCR. or ARMS (Amplification Refactory Mutation System) 
may be implemented with the specific primers of the invention 

The primers, such as described in Tables 1 and 3. or including junctions of 
5 Table 2, or more generally including the flanking sequences of one of the 24 
exons are also a part of the invention. 

The kit for the detection of a predisposition to LGMD2 by nucleic acid 
amplification is also in the scope of the invention, such a kit comprises a least 
PGR primers selected from the group of . 
10 a) in those described in table 1 

b) in those described in table 3 

c) those including the introns-exons junctions of Table 2. 

d) derived from primers defined in a).b) or c). 

The nucleic acid sequence of claim 1 to 3 might be inserted in a viral or a 
retrovral vector, said vector being able to transfect a packaging cell line. 

The packaging transfected cell line, might be used as a drug for gene 
therapy of LGMD2. 

The treatment of LGMD2 disease by gene therapy is implemented by a 
Phamiaceutical composition containing a component selected from the group of : 

a) a nucleic acid sequence according to claims 1 to 4. 

b) a cell line according to claim 24, 

c) an aminoacid sequence according to claims 5 to 9. 
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CLAIMS 

1. A nucleic acid sequence comprising : 

1) the sequence represented in Figure 8; or 

2) the sequence represented in Figure 2; or 

5 3) a part of the sequence of Figure 2 with the proviso that it is able to 

code for a protein having a calcium dependant protease activity involved in a 
LGMD2 disease ; or 

4) a sequence derived from a sequence defined in 1). 2) or 3) by 
substitution, deletion or addition of one or more nucleotides with the proviso that 
10 said sequence still codes for said protease. 

2. A nucleic add sequence that is complementary to a nucleic acid 
sequence according to claim 1. 

3. A nucleic add sequence comprising in its structure a nudeotidic 
sequence according to daim 1 or 2. under the control of regulatory elements, 
and involved in the expression of calpain adivity in a LGMD2 disease. 

4. A nucleic acid sequence encoding the aminoadd sequence represented 
in Figure 2. 

5. An amino acid sequence which is coded by a nucleic acid sequence 
according to daims 1 to 4. charaderized in that it is a caldum dependent 
protease enzyme belonging to the calpain family, involved in the etiology of 
LGM02. 

6. An aminoadd sequence according to claim 5 or 6. charaderized in that 
either it contains the sequence such as represented in Figure 2. or the amino 
add sequence of Figure 2 modified by deletion, insertion and/or replacement of 
one or more amino adds with the proviso that such aminoadd sequence has the 
calpain adivity involved in LGMD2 disease. 

7. An amino acid sequence according to daim 5 or 6. charaderized in that 
LGMD2 is LGMD2A. 

8 A host cell unable to express a calpain enzyme adivity. characterized in 
that it is transformed or transfected with a nucleic acid sequence comprising all 
or part of the nucleic acid sequence according to any one of claims 1 to 4. 
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9. Use of a nucleic acid according to one of cJaims 1 to 4 or a host cell 
according to claim 8 in the manufacturing of a drug for the prevention or the 
treatment of an LGMD2 disease. 

10. Use of an amino acid sequence according to claims 5 to 6 in the 
manufacturing of a dmg for the prevention or the treatment of an LGMD2 
disease. 

11. Use according to claims 10 or 11. characterized in that LGMD2 is 
LGM02A. 

12. Use of an amino acid sequence according to claims 5 to 7 for the 
screening of the ligands of said amino acid sequence, said iigand being selected 
m a group consisting of substrate(s). co-factors or regulatory components. 

13. Use of a nucleic add sequence according to one of claims 1 to 4 in a 
screening method for the detennination of the components which may act on the 
regulation of gene expression of calpain. 

14. Use of an host cell according to claim 8 in a screening method for the 
detennination of components active on the expression of the calpam. 

15. A method for detecting of a predisposition to a LGMD2 disease in a 
family or a human being, such method comprising the steps of : 

- selecting one or more exons or their flanking sequences of the gene 

20 - selecting primers specific for these exons. or their flanking sequences or 
an hybrid thereof, 

- amplifying the nucteic acid 8«o«nces with these primere. the substrate for 
this ainplificatian being the DNA of a human being; and 

- comparing the amplified sequence to the cwresponding sequence derived 
2S from Figure 2 or Figure 8. 

16. The method according to claim 16. characterized in that the primers are 
those selected from the group of : 

a) those descrit>ed in Table 1 ; 

b) those described in Table 3; and 

30 c) those including the introns-exons junctions of Table 2; 

d) those derived from the primers in a), b). or c). 

I r-Jno?^ *° °^ ^^^^cterized in that LGMD2 ,s 

LOMD2A. 
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16. A kit for the detection of a predisposition to LGMD2 by nucleic and 
amplification characterized in that it comprises primers selected from the group 
of: 

a) those described in Table 1 ; 
5 b) those described in Table 3; and 

c) those including the introns-exons junctions of Table 2; 

d) those derived from the primers in a), b) or c). 

19. Use of a host cell according to claim 8 in a manufacturing of a drug for 
gene therapy of an LGMD2 disease. 
0 20. Pharmaceutical composition for the treatment of an LGM02 disease 
characterized in that in contains a component selected from the group of : 

a) a nucleic acid sequence according to claims 1 to 4. 

b) a host cell according to claim 8, 

c) an aminoacid sequence according to claims 5 to 7. 
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FIG. 7 



A) EXON 2 



B) EXON 8 



Normal 
sequence 



B519 



R12 



CGG -> CAQ 



CGA->TGA 
ArgliO Stop 



C) EXON 13 



Normal 
sequence 



AATCCCCGATTTA 






AATCCCTGATTTA 


M 


life 



TCCTCCGGGTCTT 

f f " 




TCCTCCAQGTCTT 




AGCTGGTGCGGCT 



Normal ji a 


1 1 I • 


1 1 « 

MSdfW AQCTGGGGCGGCT 


GTG->GQQ /i li, 11/1 





D) EXON 22 



Normal 
sequence 



CCATGCGGTACG C 




Amish 



CGG -> CAQ 
Arg7S9 Gin 



CCATGCAGTAC GC 



B501 




CCATGCAGTACGC 



CGG -> CAQ 
Arg769 Qln 
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LISTE DE SEQUENCES 



(1) INFORMATION GENERALE: 

(i) DEPOSANT: 

(A) NOM: AFM 

(B) RUE: 13. place de Rungis 

(C) VILLE: PARIS 

(E) PAYS: FRANCE 

(F) CODE POSTAL: 75013 

(G) TELEPHONE: (1) 45 65 13 00 

(il) TITRE DE L' INVENTION: LGMD GENE 
(ill) NOMBRE DE SEQUENCES: 4 

(iv) FORME LISIBLE PAR ORDINATEUR: 

(A) TYPE DE SUPPORT: Floppy disk 

(B) ORDINATEUR: IBM PC conpatible 

(C) SYSTEME D' EXPLOITATION: PC-DOS/MS-DOS 

(D) LOGICIEL: Patentin Release #1.0. Version #1.25 (OEB) 

(2) INFORMATION POUR LA SEQ ID NO: 1: 

(i) CARACTERISTIQUES DE LA SEQUENCE: 

(A) LONGUEUR: 3018 paires de bases 

(B) TYPE: acide nucl^ique 

(C) NOMBRE DE BRINS: double 

(D) CONFIGURATION: lin^aire 

(ii) TYPE DE MOLECULE: ADN (g^nomique) 



(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 1: 



TGATAGGTGC 


TTGTAAACTG 


TGCTTAACGA AAACATACCG TGTGCTGTAG GGACTTAACT 


60 


CTTGTTTATA TCAGTTAGCC TGGTTTCGCT 


AACAGTACAT CATTTTGCTT AAAGTCACAG 


120 


CTTACGAGAA 


CCTATCGATG 


ATGTTAAGTG 


AGGATTTTCT CTGCTCAGGT GCACTTTTTT 


180 


TTimTTAA 


GACGGAGTCT 


CTTTCTGTCA 


CCTGGGCTGG AGTGCAGTGG CGTGATCTGG 


240 


GTTCACAACA 


ACCTCTGCCT 


CCTGGGTTCA 


AGCAATTCTT CTGTCTCAGC CTCCCAAGTA 


300 


GCTGGGATTA 


CAGGCACCCG 


CCGCCACACC 


CGGCTTATTT TTGTATTTTT AGTAGAGACA 


360 


GGGTTTCACT 


ATTGTTGACC 


ATGCTGGTCT 


CGAACTCGTG ACCTCATGTG ATCCACCCGC 


420 


CTCGGCCTCC 


CAAAGTGCAG 


AGATTAGAGA 


CGTGAGCCAC ATGGCCCAGC AGGACCACTT 


480 
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TTTAGCAGAT 
CAAGTGTGCA 
CATGCAGACA 
CATTGAATAA 
CACAGAATAT 
ATTGTTTTCC 
AACATGTCAG 
ATTTTCTGTC 
TACCATGCAG 
TCCTGCCTCA 
CTGGCATGCA 
CACTCATTTC 
TAACCTCTCC 
CAGATCACAG 
CTATCTTATT 
TAGCCCATCT 
CCCCGCCCAG 
CATCATCAGC 
CAAGAAATGT 
CTCTCTCTTT 
AGCTTCCTGC 
CGGCAGCTCA 
CTCTGTCTTT 
TCAGCAAAAT 
TCTCTGAAAA 
CATCCTGTAA 
GTTCGTTTTA 



TCAGTCCCAG 
GGTAGAGACA 
TTTCCAATGA 
TGTTCTGATA 
TTTTGTAGAA 
ATTCATTTGA 
CAGTTCTCAG 
AACACCAGCA 
TCTCTCTTGC 
AGCATCTTCA 
TGCTGCTGGT 
TCAGGAGAAC 
GACCTTCTGA 
AATTACTCCA 
TTCTTTAAAA 
GTGGCTCCAA 
AGCAAGGCCA 
CGCAATTTTC 
CTAGAAAAGA 
TATAGCCAGA 
TTGCTGGCTG 
GCTGTGCACA 
AAGTGTGAAG 
CCAGAGGGAG 
AAAAAAAAAA 
AAATAAATAT 
ATATTATTCA 



12/33 

TGTTCATTTT CTGGATGGGG 
GGGATTTTCT CAAATGAGGA 
GCGCTGACCC AAGAACATTC 
TCCTAAAATT TTAGGACTAA 
TTCAGTACCT CCCGTTCACC 
TGGGCAGTAG TTGGGTGGTC 
CTTCTTTCCA GTGTTCACCT 
CTTCATGTCA ACAGAAATGT 
TCTCATACTC ACAGTGTTTC 
GGCCACTGAA ACACAACCCT 
AGGAGACCCG CAAGTCAACA 
TTATCGCTTC AGAATCACAG 
TGGGCTTTCA ACTTTGAACT 
ACTTCCCCTT TGCAGTTGCT 
AGCTTTTTCT TCCAAAGCCA 
CGACAGCGGC TGAGCCCCGG 
CTGAGGCTGG GGGTGGAAAC 
CTATTATCGG AGTGAAAGAG 
AAGTTCTTTA TGTGGACCCT 
AGTTCCCCAT CCAGTTCGTC 
GGTTTCCCCC CCACGGAGGA 
TGGGCACTGG GGGAAGGATC 
CAGGGAGGAG AGGAACAGGT 
AGCGCAGGAG GTGGGGTGAT 
ATCTTGCTTT TTATAAAAGT 
TCCTTTCTCA GAACAAATTC 
TCTTGGTAAG ATTATTTCAG 



AGAGACAAGA 
CTCTGCTGAG 
TAAAAAGATA 
AAATCATGTT 
CTAACTAGCT 
TGTATAACTG 
TACTCAGATA 
CCCTAGCCAG 
TTCACATCTA 
CACTCTCTTT 
TTCCTTCAGA 
CTCGCTTTTT 
GGATGTCGAC 
TCCTTTCCTT 
CTTGCCATGC 
TCCCCAGGGC 
GCAACTGGCA 
AAGACATTCG 
GAGTTCCCAC 
TGCAAGAGAC 
GTCCTCTCAC 
CTGGCAGCAG 
CTCAGATATT 
TCTTATGCTC 
GGGTGGAACT 
CAGACAGCCC 
TTTCTCTGGC 



GGTGCAAGGT 
TAGCATTTTC 
CCAAATCTAA 
CTCTAAAATT 
TTTTTGCAAT 
CCTACTCAAT 
CTCCCTTTTC 
CTTCTCTCTC 
TTTTTAGTTT 
CTCTCTCCCT 
AATCCTTTAG 
AAGATGGACA 
ACTTTTCTCT 
GAAGGTAGCT 
CGACCGTCAT 
CAGTTCCTCA 
TCTATTCAGC 
AGCAACTTCA 
CGGATGAGAC 
TCCGGTGAGT 
TCAGCACCTC 
CTCTGCTGGG 
TCACCAAATC 
TGGCTCTTTC 
CAGTTTAATT 
AGATGTACCT 
TAAAATCATG 



540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
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ATGTTATTCT 
GAGAAGAGTC 
AGACCTAGCA 
TAGAGTGAAA 
AACTCTCCCA 
TGGCTCACCT 
ATTGAGCATC 
AAGACCAAAA 
AAGAGTGACA 
CTAACCTCCT 
ACTATTGATT 
TATAAACCCA 
ACCAGCCTGG 
CCAGGCATGG 
CTTGAACTCG 
GGCTGACAGG 



TCTTTAATTT 
ATAGGCAAGG 
ATCGCTTTGG 
TATATCTAGT 
GCCTCTGGGT 
CTCTGATCAT 
TACTAGTGCC 
TTCCAGCTGT 
TTGTCAGGAG 
CCAGGGAGAC 
AGCCATGCTT 
GCATTTTGGA 
GCAACAGGGT 
TGGCACATGC 
GGGAGTTTGA 
AGTGAGAC 
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ACCAATGGCC ATTCTTTCTG 
AATTTTTTTC ATGCATAAAA 
TCCACCTACC TCACCTCATA 
GGGCACATGA CAGAGCCCGG 
TTCATTTACA GTGATCGCCA 
CCCTCCAGTG TGACTCTTGT 
AGCACTGGGC AAGCAACTGG 
CTTGGAACCT AGGGTCCTGA 
ACGATGTTCT GGGTGCCACA 
AAACCCTCTC TGAGGAAATG 
TTCTTTAACC TAAGGTGGGC 
AGGCCCAGGC TGGAGGATTG 
GAAAACCTAT CTCTTTTCTA 
CTGTGGTCCT AGCTACTCAG 
GGCAGCAGTG AGCCGAGATC 



AAACACAGAA 
TGTTGGGGTT 
AGTGAGGAGT 
ATTAAAACTT 
GGAGGGAAAT 
TCTTAATTCG 
GGGGACAGCA 
AGGGAAGATG 
GGATCATGTG 
ATGACAAGCT 
CAGGCATGGT 
CTTGAGCCCA 
CTAAAAATTC 
AGGCTGAGGT 
ATGCCACTGC 



ACCCTAGAAA 
AAAGAGAGAG 
CAAGGCACAC 
TGTTTTAGGA 
CACATTCCCC 
AGAAATATTT 
GTGAGTAAGA 
GGCATTGAAC 
GCAAGGAGAG 
GAGACCCAAT 
GGCTCATGCC 
AGAGTTAGAG 
AAAAAATTAT 
GGCAAGATCA 
ACTCCAGGCT 



2160 

2220 

2280 

2ZhO 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3018 
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(2) INFORMATION POUR LA SEQ ID NO: 2: 

(i) CARACTERISTIQUES DE LA SEQUENCE" 

(B) TYPE: acide nucleique 

(C) NOMBRE DE BRINS: double 

(D) CONFIGURATION: lin6aire 

(ii) TYPE DE MOLECULE: ADN (ginomique) 



(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID 
GATCCACCCG CCTTCGCCTC CCAAAGTGCT GAGATTACAG 
CCCACACTGC CCTAACTCTC AAGTTGCATC CTTACTCGAA 
CAGCATGCGA CAATGTAAAA AGGAGGCATG TTTCTCCCTT 
CTCTTTGCAC GAGTTTCTTA ACCTCTCTCG GCCTCAGTTT 
TGATAGTATT CCCTTCACAG GGCCAAATGG AATACTATCA 
TCAATAAATA ATAGCTACTG CCCCCGGCCG CGGTGGCTCA 
TGGGAGGCCC AGGCGGGTGG ATCACAAGCT CAACAGATGC 
CTCAAACCGT ATCTCTACTA AACATACAAA AATTACCTCG 
AGTCCCACCT ACTCGAGACC CTCAGGCAGC AGAATCACTT 
TCAGTCAGCC AAGATTGCAC CAGTGCACTG CAGCCTGCCG 
AAAAAAATAC CTATCTATCT ATCTGTCTAT CTACTGTTAT 
TTTCTTTCAC AGCAAATTTG CGAGAATCCC CGATTTATCA 
GACATCTGTC AAGGAGAGCT AGGTAGGAAA CTGCCTCAGG 
AAGGGGTGAT TACAAGGTGT CATCCCCTTC CAGGAGGTAA 
TCCAGTAACT TTTTGGAAGA TTTTTTATAA CAGTTGCTTT 
TGGCGATTGC TTCATTTCCT CCTACATGCC TCTTTAGCAC 
GTATCTGCAT CCTGTGGCCT CCTCTCCAGT ATCTCAAGGA 
CATGACAAAA GCCCTGCTTT TCACTGTATC GTCTTTCTTG 
GCACCAACCA TGCCCCTTGG GCATGGAGAT TCTAGATACA 
GGAAAGCACT TGTAACTGGA ACCCTTGGTT TAAATTGGCC 



NO: 2: 
GTGTGAGCCA 
TAGTATGACA 
CTGCTACTTA 
CCTTATCTGA 
GGAACACTAC 
CATCTCTAAT 
AGACCATCCT 
GCATCGTGGC 
GAACCCCGGA 
ACAGAGTGAG 
TCTTACCTGG 
TTGATGGAGC 
TCACATCCTG 
AGGGACAATC 
ATGGTCGTTT 
TCTGCCATGC 
CACTTACATA 
GAAGACAGCT 
CACACAAAAG 
CAGCATAGCT 



CCACGCCCAG 
GTGTGGGAAG 
CTAGCTGTGT 
AAAATAACAA 
ATAATGGAAC 
CCCAGCACTT 
GGCCAACATC 
CCATGCCTAT 
GGCAGAGGTT 
ACTCCGTCTC 
TCATTTCCTT 
CAACAGAACT 
CCAGATGATC 
TGTGCTTGCT 
ATCTACATGC 
ATCACAGGGG 
CCCCACTCAG 
CTGTGACTGT 
GCATCGCCAA 
CCATCTTTAA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
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AAGAGTCTTT CCACAAAGAT GGCATCCGCC ATGTCGATGA GCATCCAATT TTCTCTTTGA 
TTGCTTACCT TGACTGCTCC ATCTGATCTT CCTCTCTCTC GACCTCTTGT TCAGAAAGTA 
TTGTCTTTGG TGTGGACTAT AAGCAAGCTC TGTGAAGTAA AATTGGAGAG AACACCAACA 
GAAACAATTT AAATTTGAGG AAAAGGGGGC ACCTAAGACC AAAGGAATTT GCCTTATTTC 
ATTCCAGAAG GCGAGGCTGA GAATAAATCA GATGAATATC TGGGTTCCTC CACCTGAGGG 
AAGGCTTCCT CCAGACCCCT GGGCATAATA ATCTGGGACC TTCAAACCAA TAACCTCTTT 
TCCAAGGAAA GACTGGCTGC TTCCAAGGAG GCTAGGGGAG AGTCGCGCTG CAGGCAGCTC 
TCAAGTCTCC CCTTGCACAC TCTCAGGTTG GCATTTTCAC TTTAACCCAT CCTCCCTTAA 
GAAGCCAGTT CITTGTGACC AGGCTACACC CCCTATTATA TATATATATA CACACACAGA 
GACAGAGAGA GAGAGAGAGA GAGAGAAAGA GAGCAAAGTG TTACCTCCAA CTACATACAG 
TACTCTGTCA CAAAAGACGT TCAGAGAATA ACAAAACGTC CCGACCTCAT TCCGTTCCCA 
CCAATCTCTT ACTCCCCCCT ATAGACCCGT TCCACCCCAG CTCCCTACCT CGCCTTCCTT 
CCAATACAAA TCATCTTCGT CGATGGTTCT CTCAGGCTCA CTCTTCGCTG AAGTCAGAAG 
AGGAATTGGA CTCACATTGC AAAGGCACAG GGCAGGGCAG ATTTCCTACA GGTGTTAGGA 
AGAACAACCC AGTTATGATC ACCTACTGCT CTGTCTCCAT TGACGCCTAA AAAGGAAGTG 
ACTTTATACT CCACTTGGAG GAACTGCCTG CAGCCTTGAG CAAAATGTCT AGTCACAAGG 
CAGTAAGTTA CCTGTTGATC ATATTGTCAA GCAATTCCTG TCCAATTCTC CTTCCCTGGG 
TTGACACCTC TGTAAGGTCA GATCTGGAAG TAGCAGAGTG GGCACCAAGG GAGTCCCCGT 
TCAGGGAAGT GGACTGGCTG GCTGGGATTG GGCCTTTTTC TTCCCAGGAG GACCAGGAGT 
CCTCACGATC TGTGCCCTGT GTCTCCCTGC AGGGGACTGC TGGTTTCTCG CAGCCATTGC 
CTGCCTGACC CTGAACCACC ACCTTCTTTT CCGAGTCATA CCCCATCATC AAAGTTTCAT 
CGAAAACTAC GCAGGGATCT TCCACTTCCA GGTGAGGTAA TGAGAGTGTA GTTAAGAGGG 
CCAGCGGCAG GCCACCCAGC GCTGGTCTCC TGGCCTTGAC TTCCCAGAAG CTGGAGGAAA 
CTTCCCACCC ATCTACCCGC AGCGGCAACA GTCGGCATGG ACCCCCTTAA GGCTTCAAGC 
CTGGGAGGAA GCAGTTGCTT ATCTCTGGCT CCCTAATCCC TCCCCCACCA CCTTCCACTA 
TGTCCCAGAA AGACAGGAAG ACATCCTGTT TACTGTGGGT CTATTTTTGT CTTTGCAGCT 
CTCTGGCTGC TTTTATTCCC TGCAGCCCTT CTCAAGTAGG TCCCTAAGAT ATTAGCACTG 



1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 
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TGACACCACA 
TTCCTCTCTA 
AGAGTCTGCT 
CTAGAATCCA 
TGCTAGAGCA 
GGGAGTCAGC 
CAGTTCTTGC 
CAGGCTCAGA 
CTAAA6GGCG 
AGTGCCATTG 
CAAAAAAGAG 
AGATTCTGGG 
ATACCTAAAT 
GCTGGGGGCA 
ATGAAAATCT 
ACTCTGTGGA 
CAGTTAGAGA 
AGACTCCAGG 
GACACATTTC 
CTGGCGCTAT 
TCAACTGGTT 
GGCTTATGCT 
GTTGCAAAAT 
ATGTGTGGGC 
AGCTCTAACT 
AGCTTTAGCT 
GGGATCAGAG 



GGACCCTTCA 
AGGCATGGCG 
TAACCTGGGG 
TCCAGCTACT 
GAACCAAACT 
CTCTCTCCAG 
TCCCCGGGTC 
CTCCCCCTCC 
TTACATACAA 
AAAAGGAGAC 
CGAGATGCAA 
TCACTTTGTT 
CAGCACAGTG 
CCTGAGAGTG 
TACATCCTAA 
AGACATGAAG 
CAGATTTACA 
AAATGATGCT 
CTAACAGTAA 
GGAGAGTGGG 
TTCACCAAGT 
AAGTAAGCAA 
CGAGCCGAGA 
ATGCAAGTCC 
AAAAACATTA 
CACCATAGCG 
CATTGTCCCA 
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GGTTGTACAG GAACCCCTGT 
GTACCAAGGC TATCACTCCT 
ATCAGGCTTC TTGTTTGCCC 
GGAAATTTTC TGGGTCCCAG 
CAATTCTACC TGTGAGGGTC 
CTTCAAAGGC TCCCTCATGT 
TTGCACCTGA GCACGGAAGG 
TGCCGCCTTG GGAACATGGC 
ATATCAGATA GATTTCTGTT 
TAAACCACAT TTGGCCCTTT 
ACTTGAAATG ATTGAACAAT 
CCTCCGTTTC AATCCTTGTT 
CCTTCACTGC ATAGTTCCCA 
CTGACACCCA GGCCCTGCCC 
GACACTCATG CAGCACCTAC 
TATATGTAAC TCACTTCCAG 
CACCCCAAAC ACAAAATAGG 
GCTTTGGGAT TCAAGAACCC 
TTTGAGTATG TGACTCTGTG 
TGGACGTGGT TATAGATGAC 
CCAACCACCG CAATGAGTTC 
CACTTTAGAA TGTGAGGTGG 
CCTCACTCAC AGGAAGAGGC 
AACTGTGACC CAAAGTTAGA 
AATTTAAGAG TAGAAATGAA 
AGTTCTTTCA TTGCACCTCC 
GGGTCTCGAT TGGCTCAACC 



CCAGGGCTCC 
CTCTTCCAAG 
TAGAACTGAA 
TCACCTTGGC 
TCGTAGCTTC 
CCCAGGATGA 
CCTCAGAAAA 
ATATTTAAAG 
CTCATTTCAA 
TCAGTTCAAA 
CTTCCTGCTA 
CTTCAGTTTG 
ATCCTGGCCA 
CAGACCTGCT 
TCTACCCATT 
CTCTCAAAAA 
ATGAACAGGC 
CCTGAGGAAT 
CGTGACGCTT 
TGCCTCCCAA 
TGGAGTGCTC 
GGCTAGAGGT 
ATGTGCCTCT 
GATCAGTTCC 
GATTTGCATA 
ATGGTGGCAT 
TCATGTGCTT 



TGTATACTTC 
CCCTGGAAGA 
TCTGATGGTT 
ATAGAGCTGG 
CGGGATGCTG 
CCCACATTAT 
GGTCTGTCTC 
GGTCTCAGAT 
TGAGGGAGAA 
CTGATTCATT 
CAGGTAGAAT 
GCATCAAGAA 
GATTGAATCA 
GAGGAGGAGA 
ACTGGGCTGG 
GCAGCCAGTC 
ACCCAGATGG 
GTGGAGGAAG 
CTGTGCAGTT 
CGTAGAACAA 
TGGTGGAGAA 
GAGAAAGTGG 
ATACGTGCAT 
AGGCAACAAC 
GAAGACCTTT 
TGCAACTCTT 
ATAGAAGATT 



2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 
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3960 
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4140 
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TATAAAGACA 
CAAATTTTGG 
TCACTGTAGA 
TGGGACTACA 
GTTTTCCCAT 
TGCCCTCCCG 
TTTAATGCAG 
ATAGCTGGGT 
CTCTTGACCT 
CATAGGGTTA 
TGGCAGTGGT 
TCTCTCTAAC 
TGGAGGACTT 
TGTACAAGAT 
TAAGTCTGGG 
ATGAAGGGCA 
CAACTGGCTC 
TTCAATAAAG 
AGAGGCCTGT 
TTAAGCTGGC 
CAGTCACTGG 
GGAGAAGCTA 
CCAAGGGTAG 
AGGGCTGCTC 
GAAGCTGAAT 
TTAGCACACA 
ATTCGTGCTC 
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TGTTGTCTCT CAACTTAAAA GCTCCACCCC 
AACAAGGTCA CTCTGTAATG CAGGCTGGAG 
TTGACCTCCT GGGTTCAAGG TGCTCCTCCC 
TGCGGGCATC ACCATGGCCC TTTTATTTTT 
GTTGACCCAG ACTGTTCTCG AACTCTTGGG 
AAGCGCTGGG ATTGCCGGTG TGAGCCACCA 
CCCTTCCTCA ACGTTCAGGA TGTAGTGGAA 
TTCAATCCCA GTGCTTCTGG CTCTCTGTGG 
CAGTTTCTTC ATTATGAAGA AAGGGAATCA 
ATGTGGAATT GATGAAAGAA CATCACAGCA 
ACCTGGGTTT TGTTCCCTGG AACTCTGTGA 
CCTCCATGCT TCCTACGAAG CTCTGAAAGG 
CACAGCAGGC CTCGCAGAGT TTTTTGAGAT 
CATGAAGAAA GCCATCGAGA GAGGCTCCCT 
GTGTGGGGCA CAGGGTGGGG AGCTCCAAGT 
GCATAGAGCT TTTGTGTGGG ACAGAGCGAA 
TCAACTTTGA GGACTGGGAA TTTCTCAAGG 
ACACTGGTCA AGGACATTTC AAGCCCTGGA 
GTCAGTGGAG GCCTCCCTTG CTGGTGCTCC 
CACGTACTTG GCTGTGGACC TGAGCCCACC 
GCTTTCACCA CACCTCCCCG CTTGAGACGT 
AGCCTGCAGC ACCTTTCAGT GCAAAGAAAT 
GGAGATGGCC GCCCATGGCC AGGCCTCCTT 
AGTATATTGA TATGATAATC TTAGTGGTTT 
TCCTGCCCCT TCTTCTCCCA ACACGCCCAA 
ACACCATGGA TGAACTTTTT TTCTGTATCA 
TGTTGATCTC TCCTCTCTCC CTTTGTCTGT 



AGATGATAAT 
TGCAGTGGTG 
ACCTCAGCCT 
GTATTTTTTT 
CTCATACAAT 
CACCGGCAGC 
AGAGCTCTCA 
TCTTGGGTGG 
TTGTTTCCAT 
TCCAAGAGGT 
CCCCAAATTG 
TGGGAACACC 
CAGGGATGCT 
CATGGGCTGC 
GTCAGGAAGC 
TGTTTTGTTT 
GAGAACAGTT 
ATGTCAGTGG 
TCAGTCTCAG 
ATTTCCCTAA 
GGGCTTTGTG 
GCTGTGAACT 
CAGGGGGCAT 
CCATTGGGGA 
TGGACAGCTT 
CTTTTCTCCG 
CCCATCTCTT 



AATGGATTTT 
CAGTCACGGA 
CCCAAGTAGC 
GTAGAGCGGG 
CCACCAGCCT 
TGCTAATGGC 
GGAAGTGGGG 
GTCACTTAGC 
CCCATGAGCT 
AAAGTTCTGG 
CTCTTCATCC 
ACAGAGGCCA 
CCTAGTGACA ^ 
TCCATTCATG 
CTTTTACCCA 
GAGGAAGCAG 
CTTCCGGATT 
AAATCAGTCC 
CACGCTCCCA 
GAAAGCCTCC 
TTGTTACCTG 
GAGAGAGGAG 
GCCTTCCCTG 
GGATGGGGCT 
GGAAGGTCAG 
TCTTTCCTCC 
TCTCCTCTCT 



4500 

A560 

4620 

4680 

4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 
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CCTTCCCTTT 
TCCATCGGGC 
ATGGGGGAGT 
GACCTCGACC 
AGAACTGACC 
CAGCCAGGGC 
ACTTGCTCAG 
GCTGGAGTTT 
GAAATACTTG 
TTTATTCATT 
CAAAGCAAAT 
CAGTTTATAT 
CAGAAAAACA 
AACATCTAGT 
TTTTAAAATG 
TTTGGTTTTT 
CAGCACAGTT 
TCCTGCCTCA 
TTTTTGTATT 
CTGACCTCAA 
CACCACACCC 
TTTAATAGCT 
TATTATTGGA 
AAATGCTATA 
AGGACAGATT 
AGAGTCATTT 
CCCCTCCCAG 



CCACCCTTCT 
CTCAGGATGG 
TGATTGCACG 
CCAGAGGCTC 
ATCCCTCCAA 
CTTACCCACA 
CCAAGGCTCC 
CTGCATATTC 
TAAAGATACT 
CAACACTTAT 
TCTCTCCTCT 
TCTAGTATTT 
CAGAGGAAAA 
ATATGTTCTT 
TTATCATATT 
TGGTTTTTTT 
GTTGCCATCT 
GCCTCCCGAC 
TTTTAGTAGA 
GTGATCCACC 
GGCCTAGTTT 
ACACAATATT 
AAGTTGAGTT 
ACGAACATCC 
CCCAGCAGTA 
CAAGCAGCTT 
TCTATTCAGC 
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GTGTTTGTTC TCTCCCTCCC 
CACGAACATG ACCTATGGAA 
GATGGTAAGG AATATGGATA 
AGATGAAAGA CCGACCCGGG 
CCCACATGAC CCCGCCCTAT 
CACCCCCACC TGGCACCTCC 
TGAAGAGGGT GCAAGAACCA 
CATGGTCCAG GCAGTTCCTC 
TCATTTATTT TGAAATATTT 
TTTTGAGCTC CTACTATGTT 
TTTTCAATAT TTGTGGAAAA 
TCATAACTTA TACCTCCTCA 
TTTCACTTAT ATTTTTCCCC 
CCAGGATTTT TCTATGCACA 
CTATGTACCT CTTTCCAGCC 
TTTTTTTTGG AAACCAAGTC 
CGGCTCACTG CAACCTCTGC 
ATAGCTGGGA TTACAGGCAC 
GACGGGGTTT CACCATGTTG 
TGCCTCAGCC TCCCAAAGTG 
GATATTCTTA ATGTGCCCAA 
CAAACACACA GATATGTTAT 
CTTTTTTTTC TTTGTTTTGT 
CAATAGATAC ATCTTTGTAT 
GAATTGCTGG GTTGAATGAT 
CCTAGGGTCT TAGAACTTAA 
ATGATCTGGA TCATGAGGAC 



CTGTGTTGTT 
CCTCTCCTTC 
ACTCACTGCT 
TGTGTACACC 
TAGTGTCAGA 
CAAGGGTCTG 
GGATTTTGGA 
TCATAACGAA 
TTCCTCTTCT 
CCAGGCACTC 
AGGAAGCTCT 
CTGGAGAATA 
ATGTAAAGAT 
CACTGAATCT 
TGCTTTTTTC 
TTGCTCTATT 
CTCCAAAGTT 
ACACCACCAC 
GCTGGAATGG 
CTGGGATTAC 
AGTATTCTCC 
AATTTATTTA 
TTTGTTTTGC 
ACATCCATGG 
ATGCTTAGGG 
GGATTAATGA 
TGAGATCTGG 



CCCTACATTC 
TGGTCTGAAC 
CCAGGACTCA 
TCCGATTATC 
CTCCCCTCAG 
GCTTGAAATA 
GGGAATCTCT 
CTATGAGACA 
AATGTATTCA 
GTGTAGCAAA 
CCCTCTTGTA 
CTGAGCCATA 
AACCACTCTT 
GTATTTTTAT 
AGTTAGTTTT 
CCCTAGCCTG 
AAACTAATTC 
ACATGGCTAA 
TCTTGAACTC 
AAGTGTAAGC 
TGTAACATTT 
CCCAATACCC 
TACTATTCTA 
TGACTTCCAT 
TAATGACAGA 
GTCTTCCCGC 
AAGAGACTGA 



6120 

6180 

6240 

6300 

6360 

6420 

6480 

6540 

6600 

6660 

6720 

6780 

6840 

6900 

6960 

7020 

7080 

7140 

7200 

7260 

7320 

7380 

7440 

7500 

7560 

7620 

7680 
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GATCTGGGAG 


AGGCTGAGAT 


AGGAAAAGGC GTGGGTCCAC 


CCATACCCCT CGCCCTGAAA 


7740 


ACAGCTCTAG 


GAATTCCGCG 


GCCTAGGAAG GGTGGGGGAA 


GCTCCTTTTA AAGCTGTGAC 


7800 


GTTAGTAGGC 


ACATGGACCA 


TAGAGACCTA TCCAGGGCTC 


ATGGGACTTT AGTGATCCTG 


7860 


CCCTTCTCCC 


AAGGATCCCC 


CATGGCTGCA ACTTGGAAAT TTCTGCAAAT GGAAGAGCTA 


7920 


CTCCTTAGGG 


ACGGTCATGT 


CTGAGCAGGG ATCTCCTCGG 


GCTTTCTTAG AATTCTCTCC 


7980 


CTGGGCACTG 


GGACTCTTGA TTTCTTGAAT ATTATCTTCC AGGTGGCTGT GGAGGAGGTG 


8040 


AGGGGATGTA AAGAAGGCTA 


GACTTGGGCA GGCGCAGTGG 


CTCATGCCTG TAATCCGAGC 


8100 


ACTTTGGGAG 


GGTGAGGCGG 


GTGGATCACC TGAGGTCAGG 


AGTTCGAGAC CAGCCTGGCT 


8160 


AACATGGTGA 


AACCCCGTTT 


CTACTAAAAA TACAAAAAAT 


TAGCTGAGCA TGGTGGCACG 


8220 


TGCCTGTAAT 


CCCAGCTACT 


CGGGAGGCTG AGGCAGGAGT ATCGCTGGAA CACGGGAGGC 


8280 


AGAGATTGCA GTGACGCGAG ATCGCGCCAC TGCACTCCAG CCTGGGCGAC ACAGCAAGAC 


8340 


TCTCTCTCAA AAAACAAAAA AGAAAGAAAA AAAGGAAAAG CTAAGACTTA CATCTGTCAC 


8400 


TTAACCCCTT 


TTCTCAAACC 


TCTTTCTCTT CCAGGAATAG 


TCAACCCCTG GATGGCTTCA 


8460 


GGGGkhQGQG 


GATCCTGAAG 


CGCAGGGGAG CCTCCAACTC 


TACCCCTTCC TCCTTTGAAG 


8520 


GATAGTAAGG 


GGTCCAGAAA 


GGAGGGGCAG GACAGTGTTA 


CGCACCCCAG ATGCCAGCAT 


8580 


CCACATTGCT 


CTCTGATGGT 


CAGGAGAGAG CCTTCTCAGG 


GAGACCAGCC TGTCTGGAGC 


8640 


TGTGTCTCTT 


GGCACTCTTA 


AAGGGCCACT GAAGGTCGGT 


TCGTGGTCGT GAGGCACACT 


8700 


TTGAGGGAGC 


AGAGTGGTCT 


GTGTCTTCAC AGAGCCCGGA AAATGAACTA GTATGAACTT 


8760 


TGCCTCCAAG 


CAGCAGAACT 


TCTGTTCCCC CGCCCCTAAT 


GGGTTCTCTG GTTACTGCTC 


8820 


TACAGACAAT 


CATTCCGGTT 


CAGTATGAGA CAAGAATGGC 


CTGCGGGCTG GTCAGAGGTC 


8880 


ACGCCTACTC 


TGTCACGGGG 


GTGGATGAGG TAAGCCTGGT 


GGGGCTTGGT GGGGCAAGGG 


8940 


CACCCTCCTG 


GGTTAACCTC 


ATGAAGTCAG GACTTAGCTG 


TTGGGGCCCC TGCCCTGTCT 


9000 


GCAGAGCTTG 


CCTCCAATCA 


GGACATTCAG TTCAAGGTCC 


AAGCCACGCC TGGGAGCAGA 


9060 


GGGGCCTGTG 


AAACTGGTAG 


AGGTGGATCC TGCCACAGTT 


GGTGCACAGT TTATCTTTGC 


9120 


TTTTCGTGCT AAAGATGGCA ATTTTTCCAA CATTTCCAAT 


GAACAAATTG AAATATCACT 


9180 


TAACTTTGCT 


TTTACAAAGT 


TGGTTTCATG TGTTCTTGAG 


CTTCCTGTTC TCTCGTGTTC 


9240 


AGATAGCTAC 


AGTTGTCTCT 


GGGTAGCCAC GGGGACTGGT 


TCCAGAAGCC CCAACAGTAA 


9300 



FiG . 8B/6 

SUBSTITUTE SHEET (RULE 26) 

BNSDOCI D: <WO 961 61 7SA2 J_> 



wo 96/16175 



PCT/EP95/04575 



CAAAATCTGC 
TGCACATCCT 
GAAATGCTAT 
TTGTATTATT 
GCCATGGATA 
TGGACCATGT 
CCCTGTACAG 
GGAGGGAGTG 
TTCCTTTTTT 
CCACAGTCTC 
CCTCCTGAGT 
TAGTAGAGAT 
TCCCCCCACC 
AGCAGGGGCC 
CTCCTGTTTA 
TTGCCTCTTA 
CGTCTGACCC 
CAGGCAGAAC 
ACAGAAGATT 
GTCCCGTTCA 
GAGTGGAACG 
GGGGAACAGG 
CTTGGTATAA 
CTCCAGGGAA 
CCATTGGCTG 
ATGAATGCAG 
TCCTGTGAAA 



AGATGCTCAA 
CCCATATACT 
GTAAATAGTT 
ATTTTTTCTT 
CGAGAGGCTG 
CTGAGACAGG 
AGGGATGGGC 
TGCTTGTTTG 
TTTTTTTTTG 
GGCTCACTGC 
AGCTGGGATT 
GGGGGTTTCT 
TCGGCCTCTC 
TTTTTTCTAA 
TGCCTCACCT 
GAAGATAGAG 
ACCCCCTGCC 
ACCCTCGCGT 
CCCTTTCCAG 
AAGGTGAGAA 
GTTCTTGGAG 
GTCCGGGACA 
AATCACCCTC 
GGGCCAGGAG 
GAAAGGAAGG 
GGTTCTGGGC 
TGGGAACAGT 



20/33 

GTCCCTTCTG TAAAATGGAG 
TTAAGTCATC TCTGGATTAC 
ATTGCACTGC ATTGGGTTTT 
TTTTTGAATA TTTTTGATCC 
ACTGTTCTGT TTTGCTCCTT 
AACGTTGTAA GACCTGTTGC 
TGAGAGGGGC AGTTGCCTGC 
TAGTTCCTCA GTCAGCAGGG 
AGACGGAGTC TCACTCTGTT 
AATGTCCGCC TCCTGGATTC 
ACAGGCGCGT GTCACCATGC 
CCATGTTGAT CAGCCTGGTC 
AAAGTGCTGG GATTACAGGC 
TTTATATGAA GACACCTAAT 
CCTCCCCCGA AGCTCATACG 
AGGAGATGCG AAGCCTAAGT 
ATTCCCCAGC ACACTTGTGA 
AAGAGATTTG CCCCCCAGCC 
AGAGGCTGCA GAGCATGAGA 
AGTGAAGCTG GTGCGGCTGC 
TGATAGGTAG GTGAGGGGAC 
AGGCTGTGTT GGGAACTGAG 
AAAACCAATG ATCCGCAGAG 
TGGAAGCGGG GTGCTGGGGA 
ATTCCAGAAA GCGTGGGGAA 
TAGAGAAGTG ACTTCCCTTC 
ATTATTAGCA CTTACCTTGT 



TAGTATTTGC 
TTACGATACC 
TTTGGTATTA 
ACAATTCGTT 
CTGGGACTTC 
ACACAGTTGG 
ATCACCCATT 
GCCTTTTGTC 
GCCCAGGCTG 
AAGCGATTTT 
CCACCTAATT 
TCGAACTCCT 
GTGAGCCACC 
TTATATGTGT 
GCAGGATGTT 
TAGGCAGACT 
TTAATCTCCT 
CCGTCCCAGC 
GCTCTTTCTG 
GGAATCCGTG 
CCCACGGGAT 
CCATGAGAGT 
AAGAGGGGCA 
CCCAGAGAGG 
GGTCCAGGCA 
TTGGGGTCTT 
GGGCTGATAT 



ATATAACCTA 
TAACACAATG 
TTTTCTGTTG 
ATATGGGAAA 
TGGGTTTTCC 
GCAGGTTGTG 
GCAGCAGACT 
TTTCCTTCCT 
CAGTGTAGTG 
CCTGCCTCAG 
TTTGTATTTT 
GACCTCGTGA 
ACGCCTGGCC 
TAGCAAAGGG 
GGTGAGAAAA 
CAGGAGGATA 
TGGCGAGAGG 
CCTCAGCTAG 
TGTGCTTAAG 
GGGCCAGGTG 
TGGCGGTGGC 
ATTGAAGATG 
CAGGTGTTGG 
TTGCTGACAA 
GGAAAAGCGT 
GTGTTGCCTT 
TGAGGAGTAA 



9360 
9420 
9480 
9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 
10320 
10380 
10440 
10500 
10560 
10620 
10680 
10740 
10800 
10860 
10920 



FIG .8B/7 

SUBSTITUTE SHEET (RULE 26) 

BNSOOCID: <WO 96161 75A2J_> 



wo 96/16175 
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CTGGGACTTG 
TCCCTTGTAT 
AGGGATTAAG 
CCATGCACAC 
AGAGGCAATT 
CACTCAGTCG 
TTTACTCCTG 
AGTAAATGTT 
CTTTCTACCC 



TTTTTGGGCA 
TAAGGCACAA 
ACCTTGGGGG 
TTCGTAAAAC 
CATTACTGAA 
AGGAGAAGGA 
GGTACTTCCT 
TCTTGGGCAC 
ACCCCCTCCC 



21/33 

AGTGCTGAGC CATTGCTAAG 
GGGCCCTTTG AAAAGAATTT 
CCAACCCAAA ATAAACATGC 
CTCCATGGTC CTACTGGTTC 
TGAGCCATAA GCGCCTCTTA 
CCGCACCCAG GCAGCCTGGG 
AGCCCAGCAT GTAATTACTG 
CTACTACATA GGAGGCACAG 
TCCCTACACT CTGATTAGGG 



ATTCCCCTTA 
TACCTGCTTT 
GAACTTATTA 
CTGATTACCT 
TTTCGAGAGG 
CCCCTCGGCT 
GTTCGTTCAG 
GTCAAGGCAC 
ACTGACCGAT 



CCCGTGCTTG 
ATCAATTGAA 
TTTATAGGCT 
CCACTCAATG 
GGGATGGCAG 
CCTGTACTTA 
TCATTCGTTT 
TGGGGATATT 



10980 

11040 

11100 

11160 

11220 

11280 

11340 

11400 

11451 
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(2) INFORMATION POUR LA SEQ .ID NO: 3: 

(i) CARACTERISTIQUES DE SEQUENCE- 

(A) LONGUEUR: 1834 paired de'bases 

(B) TYPE: acide nucl^ique 

(C) NOMBRE DE BRINS : double 

(D) CONFIGURATION: lin6aire 

(li) TYPE DE MOLECULE: ADN (genomique) 



(xi) DESCRIPTION DE SEQUENCE: SEQ ID 
ATTTTTTTTT TTTTTTTTGA CACGGAGTCT CACTCTGCCA 
CGCGATCTTC GCTCACTGCA ACCTCCGCCT CCCCGCTTCA 
CTCCTCAGTA CCTGAGACTA TAGGTGCCCC CCACCACGCC 
ATTACGACGG GGTTTCACCA TATTGGCCAG CCTGCTCTCC 
CCCCACCTCC CCCTCCCAAA GTGCTGCGAT TACAGGTGTC 
AACTCAATTC TTAACCTTTA AAGTATGATG AGAAGAAGGA 
TTAAGGACTT TACGCTCACT CTTGAGGATG TCACAAGTCA 
GAGGTTAACA GGTGAAGTCA GCATTTTGCT AGTTCACAGC 
CTCTGATACC TCCTGTCCCA ACCTACATCA GGCCTTCCCT 
CTCCATTTTC CCACCAGATG CAAGGACTGG AGCTTTGTCG 
CTCCACCACC AGGTCACTGA GGATGGAGAG TTCTGCTGAG 
AGAAGGGTAA GGGTGGGGAA GAGAGGGGAA ATCTCAGACC 
TCAGATTCCA GCCCTTGGGA GATCTTGGCT GTGTTCTCCT 
GATGAGCTTC TGAGAGGAGC CTTCCAGGCC ACAGGGACAA 
CATGACATGG CTCTTGCCTC CTGTGTGCCC CTCCGCCACA 
CACCCTGGCC TTAGCACAAT TCTTTTCTGA GCCTAGGAAG 
CAACCTCAAC CTCACCCTCT CTCAGGTTGT TTCTATTCAG 
GAGAATTTTC AAGTCTCAGC TTAAGGAGAG CCCCCTAAGT 
TTTATGATGC TCATCACCCT TAAAATTGTT TGCTTAAGCC 
GTAATCCCAG CACTTTGGGA GGCCGAGGTG AACGGATCAC 



NO: 3: 
CCCACGCTGG 
ACTGATTCTT 
CAGCTAATTT 
AAATCCTGAC 
AGCCATTCCC 
TCAAGCCCTC 
TTCCTATTGG 
AGCTGCAACT 
TCTTCCTCCT 
ACAAAGATGA 
TCCAGAACCC 
TCAGTCCCCA 
CCAGCCCAAG 
TGAGCCCAGG 
CACTCTATTC 
CTCCACTTAC 
GCTTCAAGTC 
TCCCCGAGGA 
GGGCGCGGTG 
GAGGTCAGGA 



AGTGCAATCG 
CTGCCTTAGC 
TTGTATTTTT 
CTTGTGATCC 
AGCAGCCCAG 
ACCAGCCCAT 
GTTTCACACT 
CTTTGTATTT 
TCCTTAATTC 
CAAGGCCCGT 
AGGAAGACCC 
GCTAAGGTTA 
GCCCAGCAAG 
ACCAGGCCAA 
CAGCCACAGG 
CCTGATCTTC 
TCAGCTTAAG 
CTGGGATTAA 
GCTCACGCCT 
GATCGAGAAC 



60 
120 
180 
240 
300 
360 
420 
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ATCTTCGCTA ACACGGTGAA ACCCTGTCTG TACTAAAAAT ACACAAAAAA AGTAGCCGGG 
CGTGGCAGCG TGCGCCTGTA GTCCTAGCTG CTGGGGAGGC TGAGGCAGGA GAATCACTTG 
AACCTGGGAG GCAGAGGTTA CAGTGAGCCC AGATTGCGCC ACTGCACTCC AGCCTGGGCG 
ACAACAGAGA CTCTGTCTTG GAAAAAAAAA AAAAAATGTG GTCTTAGTTT AATGTCAAGG 
GAAAGGTTTT GGGTGTTrTT ATTACTTTAT TTTTTATTTA AAAACTATAA TAGAGACGGG 
CCTCCCTATA TTTCTCGCGC TGCTCTCAAA CTCCTGGGCT CAAGCGGTCC TCCCACCTTG 
GCCTCCCAAA ATGCTGGCAT GTGGGCCTGG TCAACATATG CGACCCCAAC TCTACAAAAA 
ATTTTAAAAT TAGCCAGATC TGGTGGCGTG TCCCTCTAGT CCCAGCTACT TGGGAGGCTG 
AACCAGGGGC TCACTTGAGC CCAGGAGGTT GAGGCTGCAG TGAACTATGA TTGTCGTTCA 
CTTTTCTTCT CAACGTGAGA TTAAGTGTAG TCAGCAATTT GGCTTAGGAT TATTTATTCA 
CAATTTTTAA CCGTCACGTT CCGCCAAACC AGCT 



1260 
1320 
1380 
1A60 
1500 
1560 
1620 
1680 
1740 
1800 
1834 
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(2) INFORMATION POUR LA SEQ ID NO: 4: 

(i) CARACTERISTIQUES DE LA SEQUENCE- 

(A) LONGUEUR: 1A664 paires de bases 

(B) TYPE: acide nucleique 

(C) NOMBRE DE BRINS: double 

(D) CONFIGURATION: lineaire 

(ii) TYPE DE MOLECULE: ADN (g^nomlque) 



(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID 
AGGAGGTGGA GGTTGCAGTG AGCCAAGATC ATGCCACTGC 
GCGAGACTCT GTCTCAAAAA ATACACACAC ACACACACAC 
ACACACATAT ATATACACAC ATATATATAC ACACACATAT 
ATATATGTGT GTGTGTATAT ATACACACAC ACACTATTCT 
CTGTGTCTCC TCTGCTATTG AGCATGAGCC CHliimT 
GTCTCACTTT GTCGCCCAGG CTGGCATACA ATGGCGCAAT 
GCCTCCTGGC TTCAACTGAT TCTCCTGCCT CAGCCTCCCA 
CCCGCCATAA TGCTCAGCTA ATTTTTGTAT TTTCAGTAGA 
GCCAACCTCG TCTCAAACTC CTAGCCTCAG GTGATCCACC 
CTGGGATTAC AGGCATGAGC CACAGCACCC TCGTGAGCAC 
TAACTGTATT TTTCTATCCA TTAGCCACCC TCTTTTCATC 
CAGCCTCTGG TAACCACTGT CTGCTCTCTA CTTCCATGAC 
CACATATGAG TGAGAGCATG CGACATTTAT CTTTCTGGCC 
TGTTAGAAAA GATGATGGTT TGGACTAGAT ACATCAGAAG 
AGGAAAGACA GGCTCCTCTG GGACCCTGAC CAAGTTCCTG 
CTGTGTTAGT CCTGGGGTCT TCCGTTCCCA GCCCTCCTCA 
TCTCTTCTTC CAACCTCTCA GGATGTCCTA TGAGGATTTC 
GGAGATCTGC AACCTCACGG CCGATGCTCT GCAGTCTGAC 
GTCTGTGAAC GAGGGCCGCT GGGTACGGGG TTGCTCTGCC 
AGGTGGGAGA TGCTCTTGAT GGGGGGAGGG TCTAAGCCGA 



NO: 4: 
ACTCTAGCCT 
ACACACACAC 
ACACACACAC 
ATATATTCTT 
T TlilililT 
ATCGGCTCAC 
AGTAACTAGG 
GATGGGGTTT 
TCCCTCAGCC 
TAGAGCTTAT 
CTCCCCTCTC 
ATATGCTTTG 
CTGGCACATT 
TGACAGCCTT 
TGAACTATTT 
CCTGCTCCCA 
ATCTACCATT 
AAGCTTCAGA 
GGAGGCTGCC 
AAAAGTTCCA 



GGGCAACAGA 
ACACACACAC 
ACGTCTGTAT 
GTAGAGCTAT 
TTGAGACAGA 
TGCAACCTCC 
ATTACAAGTG 
CACCATGTTG 
TCCCAAAGTG 
TTCTTCTATC 
CTTCCCTTCC 
TTTTAGCTCT 
TTTGAATCAT 
TGCCCTAAAA 
TATTATTGTG 
TATGGCTCTC 
TCACAAAGTT 
CCTGGACAGT 
GCAACTTCCC 
GGCAGAAGAA 



60 
120 
180 
240 
300 
360 
420 
480 
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GCCTAACTAG 
CCTGTTTTAC 
AACTTTGGAA 
TATGTCCAGA 
TGCCTAACCC 
CTTGACCTTA 
AAGTTGTTGT 
AGAATGGTTG 
GTAAAGTTAA 
TAACAGTAAG 
GATGATGAAA 
GGGAGACCAT 
GGACCTGACA 
TAATCCCAGC 
CAGCACAGGC 
GAGGATTGCT 
CCTCCCTGGG 
ATCAGCGTGT 
CTTGGGACAT 
CATCGGATGC 
GGAATAAAGG 
GAGGCTGAGG 
TGAAAACCCA 
ATCCCAGCTA 
CAGTGAGCCG 
AAAAAAAAAA 
TAGGGGGCAG 



TGCTTATTAA 
TGAGAAGGAA 
GCAGGAACTT 
TAAGCCCATC 
CCCAAACCTC 
AGCCTAAAGT 
AATATCTCCA 
TACGGGTACT 
AAAATTGTAG 
GGCACTATTG 
ACCTAGACCA 
ATTGGGTGAT 
ACATTGCAAG 
ACTTTGCGAG 
AACATAGTGA 
TGAGCTCGGG 
TGACAGAGTG 
TGTTTGTTTT 
GGAAAGTTTG 
GCATATTAGA 
AAAGAAGAGG 
CAGGCGGATC 
TCTCTACTAA 
CTTGGGAGGC 
AGATCGCGCC 
AAGTGAGAGA 
TTAAAAAGCA 



25/33 

GTCTCTCTGT TCCAGACGTC 



ACCACCATGC TGAGAAGTTT 
GTGGGAACAA TGCAGATGCT 
CATCTTTTGA AAATACCCTA 
ATAGCTTACC CTGGCCTACC 
TGGGCCAAAT CATCTAACTC 
TGTAACTTAC TTAATACTTG 
CGAAATCCAG TTTCTACTGA 
CCGAACCATC CTAAGTCAGG 
GAGAACCAAG TTAGCAGCTG 
AGTCAGTAGC AGCAGAGATG 
GTAGGGAAGG AAGAAGAATG 
ATAAGACAGA CAAGAAGATG 
GCAGAGCCAG GAGGATCACT 
CACCTCATCG TTACCCAAAA 
AGGTTGAGGC TACAATAAAC 
AGACCCTGCC TCAAAAAAAA 
TGGTGGAGTT AATTGTGGGG 
AGGTTCCTGT AGAGTGTCCC 
TGGCACTTGG TGATATGATA 
CCAGACGTGG TGGCTTATGC 
ACTTGTGGTC AGGAGTTCGA 
AGATACAAAA ATTAACCGGG 
TCACTCAGAA GAATCGCTTG 
ACTGCACTCT AGCCTGGGCA 
GATTGAGGCT GGGATATATG 
GAAGTAAGAA AGATTGCCTA 



CACTATCTTA 
GCAATAGGGA 
GCTTGGACTT 
AGTGAAAAGT 
CTCAAACATT 
CAAAGCCTAT 
TACCTAAAAA 
ATGTGCATCT 
GACTGTGAGT 
CTGCAATAGT 
GAGGGGAGAC 
ATGTCAAGAT 
GCGTCGCTGC 
TGAGCCCAGG 
TAAAAAAAAA 
TGTGATCATG 
AAGACACACA 
TTCTAGGGAA 
AGTGAAGATT 
AGAACTCAAA 
CTGTAATCCC 
GACCAGCTTG 
GATGATGGTG 
AACCCAGGAG 
ACAGAGCCAG 
GCTCAGGCAT 
GGGAGGCAGG 



TTAAACCTTC 
GCTGGGTAGC 
ACGATGAGGT 
GCATCCAATA 
GCTCGGAACC 
TTTACAAAGA 
GTGAAAAACA 
CTTTCACATT 
ACTGTGTCAG 
TCAACTCAGA 
AGCAGATTTA 
TCCCAGTTCG 
CTCATGCCTA 
AGTTCAACAC 
AATGAGCTCG 
CCACTGCACT 
AGAGAAAAAT 
AGGAATTTAG 
TGTAATAGAG 
AAATATTTGA 
AGCACTTTGG 
GCTAACATGG 
GGTGCCTGTA 
GCGGAGGCTG 
ACTCCGTCTC 
CATGCGCGTG 
AAGGGTGAGG 



1260 

1320 

1380 

1A40 

1500 

1560 

1620 

1680 

17A0 

1800 

1860 

1920 
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2040 
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2160 
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TGAGAGGAGA 
AGGAAAACAA 
CACATACAAT 
TAGAGAAATG 
AGAAATAAAT 
GCGTGCTCCC 
CTCCTGGAGG 
CTGATGCAGA 
TTCGCCATCT 
GAGAGGACGC 
CAGAGCTCCT 
ATTAGAGAGG 
GCGGGCTGCA 
CATCTTCCTT 
CACTTGTGTG 
CCTGGTGGGG 
CACTTGCAGA 
TCCACTTTCT 
TCCTCCACGC 
CTTGGGATGG 
GTCCTTCCAG 
TGTGGGGATG 
GGTAGGGAGC 
CGTGACCAGG 
CCCGAGGGGC 
ACCTCCAGAA 
ACATGCGGGA 



AGAGGCCCAG 
AACCATCAGG 
CAGTACAGCT 
CCTGATTCGG 
GGTTCCCTCT 
TTTCTGGGGG 
AGGACGATGA 
AGAACCGGCG 
ACGAGGTGTG 
TTCCAGGGGC 
CGTATCAGGA 
CAGTGGAGCC 
GTTCCTGGCA 
TCTCTTTCTT 
CAGCACTACC 
TTTGTGGGCA 
GCAGGTTGCC 
CCCTCGCACC 
TTACAGCCAC 
AGGAATCACT 
CCTGAGGGGC 
ACACATTTCC 
CTATTTAACC 
GAGTTGGGAA 
TCATGTGCCC 
GGACTTCTTC 
GGTGTCCCAG 



2m3 

GACCAGATTC TAGTCACCAA 
AAAGACTGAG AATGAAAGCC 
CCATCTGAAT AAAGGTAGCG 
TTTTCTGTGG ATTTTTCCTA 
GTCTCATCCC CTCCCTGCCC 
TGCAGATACT TTCTGGACCA 
CCCTGATGAC TCGGAGGTGA 
GAAGGACCGG AAGCTAGGGG 
TAGTCCTGAT TGGCTCCAGC 
TTCTAGACGG GCCCTCTGCT 
CCACTTGTGT TTGTAACAAG 
GGCCTGGCAG AACAGGTGCC 
TTGCCTTCCG CAGGCTCCTC 
CTCAAGGTTC CCAAAGACGT 
CAGGGGGGCC CGAGTCTGTC 
CGACTTGTGA TAGGAGAGGG 
TCAGGGCATT GCATCACCCA 
AGACACTGCA CGTCACACAC 
ACACACAGTC ACACAGACCC 
TCCCTCAGAA CCCAGCCAAG 
TTCGGAGCTG AGCACAGCTG 
ATTCACTCTG AATCACAACA 
CTTGGGAGTC GGGAAGTAGG 
GGGACCCTTG GAGGTGGCTG 
TGGGCTCTCC CCATCTCTCA 
CTGTACAACG CCTCCAAGGC 
CGCTTCCGCC TGCCTCGCAG 



CAGCGTTTAA GGGGGAGGTA 2880 
CAGAGAGGAA CGAAAAGCCA 29AO 

CCCCCCCCCC CCCAAATCAT 3000 

AGAACCTAGA TGTGCGGAAT 3060 

TCTGAGAGGA AGCTGTGATT 3120 

ACCCTGAGTA CCGTCCGAAG 3180 

TTTGCAGCTT CCTGGTCGCC 3240 

CCAGTCTCTT CACCATTGCC 3300 

CCAGGAAACA TACTTTCCCA 3360 

TCCTCAATAG GACTGACCCA 3420 

CAAAAAATAC CAGGGGGGGC 3480 

TGGGGGTCAG GCTTCCGCAT 3540 

ATGCTCATTC ACATCTGAAG 3600 

ATAGCAGCAG CAGCGGCCAG 3660 

TGTGGCTCGT CGAGAAGCTT 3720 

CCTTGCCTGT TGTTATTTCC 3780 

TGACTACCAC CCCCAGGATG 3840 

ATGCCTTTGC AGACTCACCC 3900 

GTTCTGAGGG TGGCTGCCCG 3960 

TCCTCTAGGG CTCCTTGGGG 4020 

TTCTGGTAAG TGTCCGTGAG 4080 

GAAAAGGGAA GAGGAATTGA 4140 

GAGGTTGAAA CTGTGACATG 4200 

TGGCAGGACA GGACGTTCCT 4260 

GATGCACGGG AACAAGCAGC 4320 

CAGGAGCAAA ACCTACATCA 4380 

CGAGTACGTC ATCGTGCGCT 4440 
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CCACCTACGA 
ACCTCTCTGA 
ACTCCAGAGG 
TCCTGAGCCA 
GAGCACCTAG 
AGAAAGAAAT 
AAAAACATAT 
AGAGGGATAG 
CATCCTGTCC 
TGGAAGATGC 
AATACTAACA 
TGGTTCCTGG 
AGGAAGCCGT 
AGACCCCACA 
CAGTGGTGAG 
TGCAGGGGAC 
TGTGCTGAGC 
GGCCTTAAGC 
CTGGAAAGGA 
GAGCCGCAGC 
AGAAAGTAAG 
TGACCTGGAC 
ATAATGTGCC 
CTGGCACAGC 
CTTTTCACAT 
CCCTGGTGTT 
CAGAGCTCAG 



GCCCCACCAG 
GTGAGTGCTG 
TTGAAGGCAT 
CTGGCCACAT 
TATGTTCCAG 
CCCTGCCTTC 
GTAGCATGTT 
GAGGTGTTGG 
CTGGCCACCA 
ACCAGGTCCA 
AGAACCTCCG 
GGTTCGCGTG 
GCACCAGAGC 
TGTCTCTATT 
TCGTTTACAT 
AGATGGTGCA 
AGTCCCTCCT 
ACCGGGGGCC 
GAAGCAATTT 
GAACACTGGA 
CTGGTGCCGG 
AATTTCTGTA 
TTGCAAGGCT 
ATGGGCACTC 
GTGTCATCGC 
CAAGAAAGGA 
TAAGTGGCAG 



27/33 

GAGGGGGAAT TCATCCTCCG 



GCCCAGCTTT CCCACGTGTT 
GAGGCAGCTA GACACGTCTC 
TACCCCCATT CATTCATTCA 
GCACTGTCCT AGGCACTAAG 
ATGGAGCTTA ATATTCTAAC 
AGATTTGGAG AGGTGATATG 
GGATGCTTGA AATTTTAGGT 
CAGATGAGCT CATAGCCCCT 
TGGGTAGGTG GCTGGGTCAT 
TGCCTGGGCT TGGCTGTCGG 
TTCCAGGGGT TCTCTAGAGG 
AAACCGTCCA CGGGCCTCCT 
CCTCACAGGG AAGTTGAAAA 
CTTCTGTGCG AAAAGTCCAG 
GGGGAGAATG GGCACTGGCA 
TGGCACTGCA AATCCTACTT 
ATTGAGGCAG TTCAGGGGCT 
GAACAATCGG AGGGAACAAG 
TTCTGAGACT GGATAACATT 
ACCTGGTGTT GACACTTGGA 
ATCCCTCTCA CTCAGTTTCC 
TTTGTGAGGC TTCATCAATG 
AAACAGAGGT GCTTTTTCAC 
GATACTTGCA AGGTTGCTGA 
AGCAGAGGCT CAATGGGGTT 
GGTTTGGAAC TCACATTCAG 



GGTCTTCTCT 
TCTAAAAGCT 
CTCCAGGGTC 
TCCATTCTGT 
GATAGAGTAG 
ATGAGACAAT 
GAGCAAAAAT 
TAGCATGGCC 
GCCACTCTGA 
GCCTTTGGGG 
GGATGGTGCT 
CTGGTTCTGG 
GCTTGCTTCT 
TACCATCTCC 
AGGGTCCCCT 
GAGGGAATGG 
TGGCATGGCC 
GGGAAATATG 
GCCACAGGAA 
GGATTTCACA 
TCCTCCACTT 
TACTCAGTAA 
AGGTGATGTA 
ACTTTACACC 
GAGGTAGATG 
GAATGACTTC 
ACTCTCTGAC 



GAAAAGAGGA 
CACATGGCCC 
CTTCTGCTGC 
GATATTTATT 
TGAAGTAAAC 
AATGGATAGG 
AAAGTAGGGA 
AGGAAAGCGA 
TCTCTGTCCT 
GGCTCTGAGC 
GACATCGGGC 
CTTGGCTGCC 
GGTGACACTG 
GTGGATCGGC 
TCCCTGACCA 
GAGTCTGGGC 
AGAAGTAATC 
GAAGAGGGTC 
GGGATGACAA 
CATAGAGAAA 
ACCAGGGGGG 
AACGGGGATG 
TGTGAAGTGT 
TTACAAGGTA 
GGGTTATAAT 
TCTGAGTTCA 
TCCAGACTTA 
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GGTTTTTCCG 
GCACAGAGTG 
TTGGGCTGTA 
GGACAAATGC 
CTGAGCCAGT 
GTAGGTGCGT 
CAGGACCTGC 
GTGCAGGGTT 
TTGGGCTCCA 
ACACAGGCAG 
CGGCAGCCTC 
CTCAAAAATT 
ACCCATGTTT 
ATTTTTGAGA 
CTCACTGCAA 
CTGACATTAC 
CGGTTTCACC 
TCAGCCTCCC 
TTTTTTAAAA 
GTAAGAGTAG 
AAGAAGAATT 
TTTCTGGACC 
CTGGCGGTTC 
CAGAAAAAGA 
TGCATATGTA 
CAGACTTGCC 
TGACAACTAC 



CACCTCCACG 
CTGTGTGTTG 
GTCAGCTGAC 
CCCTCTGAAC 
GCCAGCTCTC 
CGGGATCTGT 
AAACCCAAAA 
TTTGATGTCC 
GTGTCGAGGG 
CAGGGCCCTT 
CCACAAGCTG 
TTTAAAAAAT 
TTCGTCTTTT 
CGGACTTTCA 
CCTCCGCCTC 
AGGTGCCCAC 
ATGTTGGCCA 
AAAGTGCTGG 
AAGGAAGAAA 
ATTATAAAAA 
CTCTTCTCCC 
CTGGACCCCC 
TCAGAACTTA 
AAAAAACCAA 
TGTGCATGCA 
TCTTCCTCCC 
GATTTGCTGG 



28/33 

CTGAGGCCAG CCCCAGGCAG 
GGCTCTGTGT GTTGAGGAGT 
AGTCCTTTGT GCTCTGTGGG 
TGTCTTCTGG GCAGTGACAG 
CAAGTGCCTT CTGAATGACC 
TCTGGTCATC TGGATGCTGG 
GCTTATGGGA GCTGGCACGT 
CTGCACTGAC ACAGTTGTCT 
TCAAACAAGG AATTTTGGGG 
TGGCTCAAGC TGATAGTTGC 
CGGCTTTTAC CAAAGAAAAT 
ATTCTGTAAG TCAAAATCCA 
ACTAACCAAT TTCATTTTTT 
CTCTTGTCAC CCACCCTGGA 
CCGGGTTCAA GCAATTCTCC 
CATCACGCCT CGATAATTTT 
GGATAGTCCT GAACTACTGA 
GATTACAGGC ATGAGCCAGC 
GAAAACCTTA GCCAGAAGAT 
CAAAGTCAGA GCAGTCACTC 
TTCACCCTCC ATGCCCCTTT 
ACCCCAAGCT AAAGACCAGG 
CTTTTCACTT ATTCTGCATT 
GGTAGGTGTG TGGGTAGAGA 
TGTGAAGTGT GCATGTGTGA 
CCTCCTTCCT GAGCTTCTGC 
GGGAAGGCTA CGTGCCAAGC 



TGAGAAGCCC AAAGTCCGAA 6120 

CTTGTGACTG CCTTGGGGCT 6180 

GATGACGTAG GCCAATGGGA 6240 

TCATGGTCAT AATCCTGACC 6300 

ACAGCCGATT GGTTTTAGTG 6360 

TCATCGGGTG CAGTATTGAT 6420 

CACGTGAGTA GAGCAGGCAG 6480 

GCAGTTCTCC AATTTGACAT 6540 

CGTGGGCCAA ATCTGGGAAG 6600 

CGCAGGGATT ACCAGGCCCA 6660 

CTCCCTATGT TAAATGCTTG 6720 

TTCTTAGGTC AGTTTGAGAG 6780 

TATTATTTAT TTATTTGTTT 6840 

GTGCAATGGC ATGATCTCAG 6900 

TGCCTCAGCC TCCTGAGTAG 6960 

TGTATTTTTT AGTCGAGATG 7020 

CCTCAGATAA TCCGCCCACC 7080 

ACGCCCGGCC ACCAATTTCA 7140 

CTTTTTCCTT GCCATATGCA 7200 

GTGTCTGGGC ATGGAGGAGA 7260 

TTGGCTCCAT GTGATTCAGA 7320 

ATACAGGGAA GCCACAACCA 7380 

TACTGTTTCC TTTTCTTATG 7UU0 

GCATGAAGTG TGTGTACTCA 7500 

GCTCATATGC ATCCATGCAC 7560 

TGGGGCCGAG CGTGCAGTAA 7620 

ACTCTTTTAG GTGCTTTCCA 7680 
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TGATTAATTC 
AGAGGGAGAA 
AGAGCCAGTA 
GCCATCATAG 
CCATCACAGG 
CACTACCACA 
TTATAACAGT 
TCAAAGAAGT 
CCATGTCTGT 
ACACTTCAAG 
CGATGCTCCA 
TTTGCTCTGG 
GCAAGCCAAG 
TATCTGCTTG 
GGGCTGCTTT 
AATCCCCACA 
GGCACCATGC 
CTTGAAAATA 
CAGTATCTCA 
AAGGATGAGT 
CTGTCACCCA 
TGTTCACACC 
CACACCCAGC 
ATGGTCTTGA 
ACAGGCGTGA 
GATTCCAGCC 
TTCAATCTAA 



CTTCCTCACA 
AAGGTACAGA 
CTTAGAGCCA 
GCCATCAGAT 
GGTCCTCCAG 
TTCCAGCCAC 
ATCTACAAAG 
GAACTTGCCC 
TTACTCCAAA 
GCCTGTGTTG 
TGGCCACACC 
GATCCCCACA 
GCCTCTGGGG 
AGCTGTCCTA 
TCTGCTCCGA 
GCCTTGCCTT 
AGCAGCCGCT 
TCCCTTGTTT 
TTTGATCCCC 
AAACTGAGGC 
GCCTGGAGTG 
ATTTTCCTGC 
TAATTTTTTT 
TCTCCTGACC 
ACCCCCCTGC 
CAGTGTTCTG 
ACTTTCAGGG 



29/33 

ACAGGCCTAT GAGATTAGTA 
CTTGACTAAC TTGCCCAAGG 
GGCAGTCTGG GTCCAGAGTC 
TTGGTGCTAG CATTTCTGGT 
GTACTGGTGC TGGCGCAGAC 
TGTGCTTGGG GTCAGTCCCT 
TAGGTGCTGT TATTTTTCCC 
AAGGAACAGA ACTAATGAGT 
ACCTGTGTTT CTTGCCCTCT 
TCCAGACCCA CACTCGGGCC 
ATATCCATCC TACACATCCC 
AGCTTCAGCT GCTTGAGCAA 
CCTGCTGGGA GCCAAAGCTG 
GATGAGCAGC ATGGAAGGGC 
GAGGCTCTGC CTGCCCAGTT 
CCCCCGGCTT TCCCTACAGG 
CTCCGTCCTT TTCATATCCT 
GTGTAGCATC TTAAATGTTT 
ACAAGAGCCC TATGAGGAGG 
CAGAGAGGAT ATTTTTGGTT 
CAGTGGCTTG ATCTTGGCTC 
CTCAGCCTCC CAAGTAGCTG 
GTATCTTTAG TAGAGATGGG 
TTGTGATCTG CCTGCTTCGG 
CCGGCCAGAG AGGATATTTC 
ATGGCTCACC CACTGACCAT 
TTGTAGAGGT TCCTTTGAGG 



CTATAACTAT 
CCACACAGCC 
CGTGTCCTGA 
GGTGCCTGGT 
CAGAGCTGAC 
CTCTTTTTTT 
CTTTCACAGG 
GGGGAAAATG 
TTCTCTGATG 
TGCCAGTCTG 
CCCTCAGACT 
GACACTGCTT 
GGGAGCCCTT 
AGTGGTGCAT 
CTTCTCTGCA 
TGCACCGCAT 
TGTCACTTGC 
TTGCAGTATG 
GAAAGCAGAT 
TTTTTTGAGA 
ACTGCAAGCT 
GGAGTAGAGG 
GTTTCACCCA 
CCTCCTAAAG 
TTAATGAGGG 
TCCACTAATC 
TGCCTCAGTA 



CCCGATTTTC 
AGAGAGGGGC 
ACCAGAAGAG 
GGTGATGGAT 
ACTCCTCAGG 
TCCCCCCCAA 
TGAGATAGAC 
GAAGTGGAAA 
CCACCCCCCT 
TGCCTGGCAG 
GTGACCTCCA 
AGAAGCCAGA 
TCCACGGGTC 
GAGTCCACGC 
TTGCAGCCTC 
CCACAGTGTT 
ACGAGCATGT 
ATTTTGCATT 
TTTACCATTA 
CAGTCTCACT 
CCACCTCCCA 
CACCCACCAC 
GTTAGCCAGG 
TGCTGGGATT 
GCAGGGCTGG 
CGTGTCCTTT 
CTTGCATGGT 
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GATGTGGGGT 
AAAACAGCAT 
AGTCTACAAA 
TTTCACGTGC 
TAAAGAAGTG 
GAGCTGCCGC 
CCTCAGTGTG 
CCAGCCCATC 
GGAGTCAGAG 
GTCTGGGCAT 
GGATACACAG 
TGTCCCACTG 
ATATCCCAGG 
CTGATCCAGC 
CGGAGGCGGA 
TAGGAGCAGA 
GTTAGATTGG 
TCCAGGCCCC 
GAGCCCCTCT 
TGGCCTTGAT 
GCACCTTTAG 
TGGGGTGGCG 
CTGGAGAGGT 
GCATTTGCCC 
TGGCCCAGAG 
CTTTGTGCCT 
ATTCCGGAAC 



CTGAGGGCCA 
GTTTACTGCA 
ATACTTGACA 
TGAGCTCTCA 
AAGTCACACG 
CTATTCCTTT 
CCTGTTCAGA 
ATCTTCGTTT 
GAGGGCAAAG 
CTGGCATGGG 
GGGCTGGAGG 
ACCTTTTCTT 
ATGGGGGTTC 
CAGGATACAG 
ACTCTTGTCT 
AAAGTGGGGC 
AGTGGGACCT 
CTCTTCTATC 
TCTATCTGGG 
GACAGGGTGG 
TTGGAATGCT 
GGGAGGAGGC 
GTGAAGAGTC 
GTCCCCAGCT 
GAGCTTGCCT 
CCACAGCCAC 
ATTTTCAAGC 



30/33 

AGAGCTCTGT TCTCATTAAT 



GGAAATTTAA TTGGACAGTG 
ATCACTGCAC TAGATCATGC 
ATACTCTACC ATGAGGAGGG 
GCTTGTCAGT GGCAGAGATA 
CCTCTTCTCA CTGGATAAAG 
CTGTAATCCT CCCTTCCTTC 
CGGACAGAGC AAACAGCAAC 
GGAAAACAAG CCCTGATAAG 
TGCGGTGGCC AGCACCCTAC 
CTTCCGAGGA GTTTGTCTTG 
TCAGCAAGTT CCCCTGAAAT 
CATTCTAGGA GTGGACTGGC 
AGAAGGGGAG GCAAAGGCTG 
CCTGGTGGCC TTGAGCATTT 
TGACTTCAGA AATGGGGTCC 
TAGTGGAGGT GAGCCTTAGA 
CGGGGGCCCC TCTTCTATCC 
GCCTCATGCA GTGGGGCCTA 
CTGGAGGAAT CAGAACGGTC 
CAGGCCTCGG ATGGTGGAGG 
TGTATGGCCG CCATATCTCC 
CCTGAGGCCT CGATGCATCT 
CCTGCTGCCA CCCCCGGCCG 
CACAGGCCTG TGCACCTCTG 
AGCCTGGCAA CTCTGATCAG 
AGATAGCAGG AGATGTGAGT 



CAGAGAAGCT 
TTTCCATCTG 
TGCTTTTAGC 
ATGGAGTGGG 
GAGCTTGAAC 
CTGCTCCAAG 
CTGCCTCCTC 
AAGGAGCTGG 
CAAAAGCAGT 
AGGGGCTTCC 
AACATCTGGA 
TTGGGCTGCT 
AGGCTGAGCC 
AGACAGAACG 
CACAATAGGG 
TCTAGAGCTC 
GGGAAAAGTC 
AGGGCCCCTC 
GGGGAGGTTC 
AGACCTTCTT 
GGGCTCTTGC 
TTTGGCTGGG 
CACTCCAGCT 
TTTTAGGCAC 
ACCCCTGTGA 
GAAAGTGAGG 
ACCTCCAAGC 



TGTGTTTTTA 
GAAAAAAAAA 
ATTCTTAGCA 
TATGAAAAGA 
CGAGGTTGAA 
AGAGGTGCTG 
CCTCCTCTCT 
GTGTGGACCA 
CCCCACAGGT 
TATGCGCTTG 
GCTTTGAATT 
GCTTGGGTGA 
TCCCATGGAG 
AGCTTGAGAG 
GGATAAAGGA 
ACGGCAGGGT 
TCCAGACCAA 
TTCTGTCTGG 
TCTGAGGACT 
TGACCTGCGG 
AGGTGGGGAC 
GGCGTCAGGG 
CACCAGGTCT 
TTGGCTCCCT 
ACCAGTTTTC 
AACAGCAACA 
CCAGGACGCC 
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CACAGGTGCT 
TTGGTAGAGC 
CACAAGCCCT 
TGAGCTGCGG 
GAAAAGCCCC 
6TGGAATTTG 
TCCTCTCCCT 
ACACAGTCGT 
CCCGTTGTCT 
ACACGGCTTC 
GCCGCCCCTT 
CCAGTCAGGC 
ACATCTTGGA 
TAACTGCCCT 
CTTAGGAGAG 
AGGGGAGGGT 
CTCCCTCCAA 
GACAGATGGC 
GGCCTGGCAG 
ATTCAGTGTG 
ACACAGACCA 
GTGCTGAGAA 
GGGGACTAGC 
TCCCAGGTCA 
CTCCACGTCC 
TGACATCATT 
CTGCTGCTTC 



TCCTTCTCTC 
TCACTTTGGA 
TGAGTTTTGG 
ACGGTGGTCC 
AGCTTCCCAT 
TTTTGCAAAG 
TCCTCCTCAG 
GAACAAAGGT 
CAAAGGAGCT 
ACACTGGAGT 
CCCGACCCTC 
AAAGGGCCGT 
TACTCGTCTG 
AACCCCTGTG 
CGGCTCCTGG 
TAAATAGTAC 
ATCGAGGGGG 
TGTGGAAAGC 
GTGGGAAGAG 
TGACCTCCAT 
GTCCGGCACC 
GGAAGGGGTG 
TACTAGGGCC 
CAGAGTGTCC 
ACCTCTAACA 
ACCATGCGGT 
GTTAGGCTGG 



31/33 

CTGGATTAAC TGCTCAGATT 
CTTCGGTGGA GCCAGGGGAT 
ACTGCCACGT CTGCTGGGGG 
TGATAGCTGA GGTGCAGTAT 
GACATAATAG CACCGACAGG 
TGTCCGCGCC AGGAGCTGCT 
GACATGGAGA TCTGTGCAGA 
GAGTTGCTCA AACCAAATGG 
CCTCACTCTT CTCCATCCCC 
CCTGCCGTAG CATGATTGCG 
TGTCATCAGC CCACGGGGGC 
AATTTGTGCC CAGGGAAACT 
AAAGGGGTTG TTAGAGGCGG 
CTTCTCTCAG GCCTGGGATC 
GTTACAGAGT AGGCGCAATG 
AAGAGGGCAG TGGGTAGGAC 
ATTTTGCTGT GTGCTGTGTA 
TCAACCTGCA GGAGTTCCAC 
AAAATGAAGC GTGGGAGTCA 
CCTCAAATTT TCTATTGCCA 
ATCAACAGCT ACGAGATGCG 
TCAGGGATGT GGACCCGAGA 
CCACTAGAGA AGGAGAGGGA 
GAGAGGCAGG GAAAATAGAA 
TGGTCCCCTC CACAGGATTC 
ACGCAGACAA ACACATGAAC 
AGGGCATGTT CAGTAAGTGG 



ACCAATTATT 
GTGTGCGTAG 
GCTCAGAGGC 
CTGGCCCCCT 
GATTTTACAA 
GTACTCCTGA 
TGAGCTCAAG 
QGGIGQGCTG 
CCAGACAAGG 
CTCATGGATG 
CAAGGCAACA 
TAAGGAGACC 
AAGGGGAGGA 
GTGCCCAAGG 
TCTGACTGGT 
AGCCCGGAGT 
GCCCTGACCT 
CACCTCTGGA 
AGAATGGGGT 
GAAAATTTTC 
AAATGCAGTC 
CGGTGGGAGC 
AAGGGCTTCT 
GACAGGCCCA 
CACCTCAACA 
ATCGACTTTG 
GAGAGGGGGG 



TCATTATTGT 
CACACAAATC 
CTTTTTGCTC 
GTCTTCCTCA 
ACACAGCCAG 
AGCATGAGCC 
AAGCTCCTTA 
GGTGGGGAGT 
ACCTGAAGAC 
TATCCTTCCT 
TACAGGGTGC 
CTGATTCAGA 
TGTTGGGTTG 
AAAAGTGGTC 
GGTGGAGTGG 
CTCCTAGACC 
CCCTCCTCCA 
ACAAGATTAA 
TGATTTGGAG 
AAACACTATG 
AACGAGGCAG 
AGGAATGGGA 
CACTTTCCCT 
AGGCCTCCAG 
ACCAGCTCTA 
ACAGTTTCAT 
CTGCCCTCTG 
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CTCTCTTGCA 
ATCCAGGCTG 
GTTACTGGTG 
GGTCCTCTGA 
CTGTGCATTC 
AGCTCAACGT 
AAGGTGGAGG 
GGTGGAGGGA 
TGCACTTGAG 
CCTTCTTAGG 
CTGGGGGTCT 
GGGGGGGGGG 
GGCTCCAGCT 
ACTCAGGATT 
CTACACCCCT 
TACCCATCCT 
AACAAACCCT 
CGGTTCTGAA 
GGTGGCACTC 
GGGCCAGGAA 
TAGGGCTGGT 
ATCTAAATAA 
ATCAGCCATG 
CAATGAAAAA 
CTTGAGCATA 
TGTTGCTGTA 
CTGAATAATG 



GGGGCAGTTG 
AACAAGGGCC 
ATTCTCTGCC 
GGGGAAGTTA 
TTTCACAGGA 
TCTGGAGGTA 
GGTCAACGGG 
AGGGATAGGA 
GCCGAAAAGG 
GAAGATCTAG 
GACCTGCTGG 
GGGGGTCACT 
CACCATGTAT 
TCACTTTCAC 
ACAGGCTTCC 
TCATCGGTCA 
TGTCCCTTTG 
GCGAGTGCTC 
AGCACCTCCT 
CCAAACCAGC 
TACTTTGGGC 
AGGCATGTGT 
CATGACTGAA 
CACACACAAA 
AGAATGGCTC 
AAAGACATCA 
GAGTGGAGAT 



32/33 

TGGCAACAGG CATCTCACCT 
AATGACCTCT TTAGGCCCAG 
TGCACATCTT TGTGCTGATG 
CAGTAGTAGA GGCGGAGTGC 
GCTTCTCATG CATTTGACAA 
AAGCATAGGC ACAGCACATT 
GCGGACTGGA CCCAGGGTGT 
ACAGAACATG GAGGGAGGCT 
ACCTCTGCTC CCCCAGTCAC 
GAGAAAGGAA ACAGTAAGCC 
CACTGTTCCC TTTCCTCTTG 
CTTTTCTCAT CTACATTCTG 
CCCTGAACCA GGCTGGCCTC 
CCTCTATTTC CAAAGCCATT 
AGGCACCTCA TCAGTCATGT 
TGCCTAGCCT GACCCTTTAG 
CCATGTGGAG GAAAGTGCCT 
CTGCTTACCT TGCTCTAGGC 
TGTGCTAGAG CCCTCCATCA 
ACTGGGTTCT ACTGCTGTGG 
TGTCCAACTC ATAAGTTTGG 
ATGGCTGGTC CCCTTGTGTT 
TGGCTTCCAA TCATATACTC 
AACAAAATCT TGAATTTTGT 
AGATACTTTC CAAGACATAA 
AGAATAAATG GGGTCATGTA 
TGAGCTATCC TAGCTCCTCT 



GATAATCTCC 
AATGGGATGG 
AGGGACAGCA 
GCCTGTAACT 
GGATGGAGAT 
CCCCCTACAC 
GCTCCTCATT 
CAGCAGGCTC 
TTGATGCGGG 
ACTGCTTCTT 
CCCCGTAAGA 
ATCTTGGGAC 
ATCCAAAGCC 
TACCTCAAAG 
TCCTCCTCCA 
TAAAGCAATG 
GCCTCTGGTC 
TGTCTGCAGA 
CCTTCACGCT 
GGTAAACTAA 
CTGCATTTTG 
TTGTTGTCTC 
ACCTATCACC 
AATCATGCCT 
AAGGAAGGCA 
CAACGGGAGG 
GCTCACTAAC 



AGTCTGCTCC 
CAAAGGGAGG 
CTGGGCACAC 
GGCCTCTGGC 
GGTATCATCA 
ATTAAAACTC 
TCCACACAGT 
CCAGGACACA 
AAAACATGCA 
GGAAAATCTT 
TTCCTAGGGC 
TTCTTTCAGT 
ATGCAGGATC 
GAGCGAGCAG 
TTTTACCCCC 
AGGTAGGAAG 
CGAGCCGCCT 
AGCACCTGCC 
GTCCCACCAT 
CTCAGTGGAA 
AAAAAAGGTG 
ACATTTAGAT 
TACAAGAGAA 
ATTGCTATTT 
GAGGAATAGT 
GGCCGGTTAC 
TGACCTGTCG 
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CATGACCGTG 
GCCTGCGGCA 
GCCAACGTGG 
TTAGTAAAAG 
AAGTCCACAG 
GCTCTCATCT 
GTGCTGGAGC 
TTTCCTCCTT 
AATCTGGGAA 



33/33 

GACAAAACCC TGAACGCAGC TGTTTGTTTG 
TATCTATAGG CATCCTGTGT TTTCCACCCA 
AAAGGGCTGG CCGTGAATAT GCAGACAAGG 
TACTTCATTT TCCTCTTGTA TTTGCTTCAT 
CTTTATACCA AAATGTAAGA AGGCTATTTG 
CATTT<^TTC TTCTAATCCA TATTCAATAT 
AGCTCTAGGG CATATATTTC TCTTAAATAG 
GACCCCCTCC TTTCCCAATT TATTTGGGTC 
ATGTAGTCAC CAGG 



CTAAACTTCT 

GTTTCCTTCT 

TAACGAAAGT 

TCTTGCTTCA 

CTTATAAACA 

TAAAAAATCA 

GAGAAAGATT 

ACTACCTTGA 



CTGGACCATG 

TCCTCGCTAA 

AAACCGTCAA 

CAAACTTACG 

TTTTGAGTCA 

GAAACCAAGG 

TTCAACAGCT 

ATTTAGAGTG 
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